Open raulk opened 5 years ago
@raulk I've only just started contributing to DHT & am currently working on correctly implementing the DHT bootsrapping. That task combined with the reading of the DHT paper/meandering around the codebase should give me a fairly good idea of the DHT codebase. Please count me in as & when we start focusing on this epic.
The routing table is currently a linear collection of kbuckets with increasing common-prefix-lengths, potentially resulting in significant contention for the first few k-buckets.
Assuming that bits
is an array-view over bits in own PeerId, the first kbucket in the routing table corresponds to the prefix ~bits[0]
, the second to bits[0] ~bits[1]
, and so on. Bucket[0] will be contended by around half of all available peers.
The original Kademlia recommends an LRU eviction policy for buckets filled to capacity, but libp2p-kad-dht only ever evicts disconnected or lost peers. With a value of k
set to 20, this means that after learning about 40 peers, nearly half of all new peers are unable to enter the routing-table.
Possible solutions would be to either a) maintain a replacement list alongside each kbucket for nodes which are waiting for entry to the kbucket, b) allow the first few k-buckets to have capacity larger than k
, c) evict peers from kbuckets based on some policy (such as LRU or latency), or d) use a recursive tree data-structure (instead of an array) and allow splits (up to a certain depth b
) for kbuckets not on the prefix-path of own-PeerId, as suggested in the Kademlia paper.
It would also help to periodically prune unreachable peers out of the routing table in a proactive manner.
@aarshkshah1992 – hey, thanks for reaching out! https://github.com/libp2p/go-libp2p-kad-dht/pull/315 (PR for persisting the routing table) is heading in a good direction and would benefit from somebody pushing it over the finish line. Is this something that catches your attention?
@raulk Perfect ! Please can you assign it to me ? Will get back with my comments as soon I've gone though the existing comments & code.
@raulk What's next for us here ? Is there anything I can help with ?
Critical: termination criteria for DHT queries (avoid backtracking unless necessary):
@raulk what does backtracing mean exactly?
Critical: provider record saturation. Nodes close to popular content get flooded with provider records. Related: #316 libp2p/specs#163. The latter solution has the property that, the more popular the content gets, the more widely advertised it is on the DHT, thus making queries faster.
There are a few more interesting and proven ways to effectively do load balancing n structured P2P networks
Critical: dissociating the routing table from the connection table. Currently when a peer disconnects, we drop it from the routing table. This is not proper Kademlia. #283
True. I believe this is an artifact of operating in both Public & NAT'ed networks. The rule of thumb should be more like:
Critical: participate in the DHT in client-only mode until we receive our first inbound connection via a non-relay transport. This will be tricky, but will help emergent stability and responsiveness. #216
What about transports such as WebRTC? It might be wise to have a double rule that only peers that are "directly diable" (i.e. that any multiple nodes in the network can dial directly without any hole punching, WebRTC Stun and/or Relay dance) should ascend to the mainnet DHT.
Grabbing a old illustration to make the point (not sure if it helps :))
Nodes should only "ascend" if they prove that they are willing and committed to provide a good service.
Important: routing table membership based on peer quality and failure counting. Eject peers who misbehave, present high latency, or fail frequently.
Is there any work on this at the moment?
Nodes should only "ascend" if they prove that they are willing and committed to provide a good service.
This points to a score function for nodes, which is constantly updated based on node performance. There is plenty of literature related to this, but need to do a proper filtering.
Important: routing table membership based on peer quality and failure counting. Eject peers who misbehave, present high latency, or fail frequently.
An idea that has been thrown around off-band is to not only have routing table membership based on quality and performance (see above), but also latency to the node in the routing entry. This is a form of priority function, where the routing table is sorting entries according to node priority. It is more flexible than Coral, as it generalises the priority function and allows to include more parameters than the RTT only.
Important: routing table membership based on peer quality and failure counting. Eject peers who misbehave, present high latency, or fail frequently.
An idea that has been thrown around off-band is to not only have routing table membership based on quality and performance (see above), but also latency to the node in the routing entry. This is a form of priority function, where the routing table is sorting entries according to node priority. It is more flexible than Coral, as it generalises the priority function and allows to include more parameters than the RTT only.
Then thinking about it and digging a bit deeper, it turns out that this approach opens an attack vector to DHT k-bucket routing tables: if nodes sort entries in their buckets according to performance and are returning the top entries upon request (because of top performance), authors in [*] show that it only takes 2 different IP addresses in different /24 subnets to attack the Ethereum network and eclipse peers. This is because an attacker only needs to occupy the first position in every bucket (which is not expensive to do) - see Section IV and IV.A in particular.
@yiannisbot AFAIK that paper's attack doesn't really apply to us at all. It's based on two assumptions:
1) PeerIDs are easy to generate/forge 2) There is a property of a peer that is difficult to forge and only a limited number of peers per bucket can have (i.e. each bucket can have at most one peer with an IP address in a /24 subnet and only 10 buckets may contain an address in that /24 subnet)
If property 1 is true (as is assumed in S/Kademlia) then Kademlia can be Sybil attacked pretty easily without any countermeasures (e.g. expensive peerID generation, reputation, stake, etc.)
However, if you choose to assume (as the ethereum folks did) that property 2 will save you from the problems of property 1, then the problems in the paper end up surfacing. Since we're not currently doing anything special with IP addresses the problems surfaced seem not particularly important (i.e. we have bigger Sybil attack problems to worry about).
Additionally, I'm not really sure I understand why that attack should have worked on the Ethereum network. They are just trying to find some semi random distribution of peers to talk with, so why didn't they limit their peer connections to only 1 per /24 subnet (taking their assumption that having different IP addresses is very expensive as true, which 🤷♂)? If they did that then it seems like when an adversarial peer returns k peers very close to the lookup target that they'll all need to be in different /24 subnets which they deemed infeasible.
EXTRA, EXTRA, most items on this epic are completed! Only these two are left:
@raulk I think this epic is probably safe to close. Is there anything that needs to be broken out into new issues?
This issue outlines the critical issues to solve on the road towards a solid, robust and performant DHT implementation.
Snapshotter
andSeeder
, using bootstrap peers only as a fallback. #254 #295; WIP PR here: #315.We could potentially use the brand new libp2p testlab to continuously measure the impact of the changes we make.
If you're willing to help in pushing the DHT to new heights, please comment below ;-)