the8472 / mldht

Bittorrent Mainline DHT implementation in java
Mozilla Public License 2.0
148 stars 45 forks source link

A question about the invariant of kademlia #30

Open zhiqiangxu opened 4 years ago

zhiqiangxu commented 4 years ago

The invariant of kademlia requires that every k-bucket of every node contains at least one contact if a node exists in the appropriate range.

This may not be guaranteed when a node bootstrap, I checked the code and find the bootstrap calls routerBootstrap or fillHomeBuckets based on the peers count in the routing table, the main difference is the id to lookup(Key.createRandomKey() or srv.getDerivedID()), but it doesn't seem obvious how this will guarantee the invariant, is there anything I'm missing?

the8472 commented 4 years ago

This is done via Node.fillBuckets()

zhiqiangxu commented 4 years ago

Thanks for the reply.

But IMHO fillBuckets won't populate empty ones(as commented in the code), which is likely the case when a new node joins the cluster(especially for far buckets)..

the8472 commented 4 years ago

When a node joins it does an iterative lookup from a distant vantage point, which means it'll traverse, bit by bit, towards the home ID. And since buckets are only split when there would be more than 8 entries per bucket this statistically means that after the split both siblings will have entries most of the time (rather than requiring a double-split), which in turn means most intermediate buckets between the home ID and the most distant ones generally get populated with some entries, which then triggers the fill to do the remainder. Additionally incoming requests will go into the replacement bucket, get pinged and promoted on success.

So in practice it does fill up buckets if peers exist in those keyspace ranges. But you're right that it does not ensure this eagerly.

I used to have the eager variant in the code, but if I recall correctly this lead to excessive fill attempts when malicious peers temporarily made it into the routing table and caused deep splits. Perhaps the mitigation is a bit blunt and could be reconsidered since I have implemented better defenses since then, but based on observing routing tables of deployed instances buckets do get filled as I expect them to be.

zhiqiangxu commented 4 years ago

Thanks again!

I understand your concern for malicious peers. But I think the iterative lookup from a distant vantage point won't cover entire keyspace ranges.

Suppose the joining node and the distant node has a prefix of M bits in common, and the joining node issues an iterative lookup for its own node id, very probably the iterative process will only cover those with M, M+1,... bits in common, not those with M-1,M-2,... in common, so it's very likely that after bootstrap only the buckets with at least M bits in common with the joining node are polulated, leaving the far buckets empty.

the8472 commented 4 years ago

The most distant bucket covers 50% of the keyspace and gets populated quickly with high probability so "distant vantage point" generally means 0 shared prefix bits.

zhiqiangxu commented 4 years ago

Oh I see.

It seems the problem becomes how to ensure the distant vantage point has 0 shared prefix bits in the first place? After checking the code it seems the distant vantage point is provided through bootstrapAddresses(seed list), which may has 0(50% possibility), 1(25% possibility), 2(12.5%) ... shared prefix bits ?

It also seems that a deliberate lookup for a node in the 0 shared prefix bit space will do, but I didn't find such logic in the code -_-b

the8472 commented 4 years ago

This is governed by the NodeLookup.isBootstrap flag

zhiqiangxu commented 4 years ago

Awesome ! It's a great piece of work, thanks!

zhiqiangxu commented 4 years ago

When a node joins it does an iterative lookup from a distant vantage point, which means it'll traverse, bit by bit, towards the home ID. And since buckets are only split when there would be more than 8 entries per bucket this statistically means that after the split both siblings will have entries most of the time (rather than requiring a double-split), which in turn means most intermediate buckets between the home ID and the most distant ones generally get populated with some entries, which then triggers the fill to do the remainder. Additionally incoming requests will go into the replacement bucket, get pinged and promoted on success.

So in practice it does fill up buckets if peers exist in those keyspace ranges. But you're right that it does not ensure this eagerly.

I used to have the eager variant in the code, but if I recall correctly this lead to excessive fill attempts when malicious peers temporarily made it into the routing table and caused deep splits. Perhaps the mitigation is a bit blunt and could be reconsidered since I have implemented better defenses since then, but based on observing routing tables of deployed instances buckets do get filled as I expect them to be.

I'm still thinking about the iterative lookup from a distant vantage point(a node with 0 shared prefix bits), and found that it may not "traverse, bit by bit, towards the home ID", but rather by a step of 1 bit (50% possibility), 2 bits(25% possibility) ... etc, so even when iterative lookup from a distant vantage point, the intermediate buckets may still be skipped(empty).

the8472 commented 4 years ago

You have to consider that the lookups contact multiple nodes in parallel and are repeated over time.

But sure, this is all probabilistic. Perhaps a cleaner and more reliable approach would be to start filling the most remote bucket if it isn't already, if that succeeds then fill the second-furthest bucket and so on until a lookup doesn't manage to fill a bucket, then stop the filling procedure. This would probably solve the overzealousness problem of the current fillBuckets logic in the presence of misbehaving nodes.