libp2p / rust-libp2p

The Rust Implementation of the libp2p networking stack.
https://libp2p.io
MIT License
4.45k stars 926 forks source link

Why rust-libp2p kademlia protocol's K_VALUE is consistent #5501

Closed elecbug closed 3 weeks ago

elecbug commented 1 month ago

Description

In here,

/// The `k` parameter of the Kademlia specification.
///
/// This parameter determines:
///
///   1) The (fixed) maximum number of nodes in a bucket.
///   2) The (default) replication factor, which in turn determines:
///       a) The number of closer peers returned in response to a request.
///       b) The number of closest peers to a key to search for in an iterative query.
///
/// The choice of (1) is fixed to this constant. The replication factor is configurable
/// but should generally be no greater than `K_VALUE`. All nodes in a Kademlia
/// DHT should agree on the choices made for (1) and (2).
///
/// The current value is `20`.
pub const K_VALUE: NonZeroUsize = unsafe { NonZeroUsize::new_unchecked(20) };

This is understood as not being able to set the k-bucket size for the current Kademlia implementation. However, other languages such as go provide sufficient options to set this, so is there a reason why Rust fixed these settings?

If there is no specific reason, I would like to suggest a custom protocol other than /ipfs... and a k-bucket size setting function, similar to go.

Let me know if there's anything I'm missing.

Motivation

In our lab, kademlia is conducting research to improve performance by adjusting variable values for various purposes.

Although libp2p provides more independent modularity in IPFS, bound to the parameter values of IPFS may not be beneficial in P2P in different environments than in existing IPFS.

The tunability of multiple parameters will be a more useful improvement for modularization.

Current Implementation

As described above, currently, among the meanings of K_VALUE, the bucket size is constant.

Are you planning to do it yourself in a pull request ?

Yes

guillaumemichel commented 1 month ago

Hi @elecbug, thank you for pointing this out!

Yes it makes a lot of sense. Based on my interpretation of the spec, the K_VALUE should be user defined, and it currently isn't. I am happy to review your PR and answer any questions you may have.

On a note, this magic K_VALUE is having 3 different roles, that could be decoupled from each other: 1) Size of the routing table's buckets 2) Number of replications of provider records when they are published 3) Number of closer peers that are returned during a kademlia lookup

https://github.com/libp2p/rust-libp2p/pull/5414 enables to set a custom bucket size without updating the other parameters. Unless there is a need to tweak the other 2 parameters, we should stick with using a single K_VALUE for all 3 parameters and users can set a custom bucket size if required once https://github.com/libp2p/rust-libp2p/pull/5414 is merged.

elecbug commented 1 month ago

Thanks for review @guillaumemichel

To facilitate the discussion, I would like to add a little bit of what we have researched.

First, in an environment with K_VALUE of 20, one node gets approximately maxed 5K neighbors. It is useful in large IPFS-like environments where at least thousands of nodes communicate, but in small environments, including libp2p alone, when the total number of nodes is significantly less than 5K, a dense P2P network is formed that is meaningless with DHT.

It is thought that these networks can especially hinder the participation of small-sized researchers like us, who use libp2p to create and study nodes with their own server for a small scale, and in fact, we are currently considering a transfer to the Go language.

We started working with Rust, loved it, and would love to continue. I think these modifications will be useful to make people like us more interested in rust-libp2p.

Thanks for reading :)