status-im / nim-eth

Common utilities for Ethereum
https://nimbus.status.im
Apache License 2.0
82 stars 30 forks source link

Make FindNode according to spec #205

Closed kdeme closed 4 years ago

kdeme commented 4 years ago

The current FindNode call will respond with ENRs closest to distance. According to spec it should only be ENRs with that distance. This is because we reuse the discovery v4 code where it should indeed respond with all closest nodes (<= k)

Next to that, there are some weird findings in https://github.com/vacp2p/research/pull/19# . It should be verified if this is due to bugs in the findNode implementation.

kdeme commented 4 years ago

After further testing the FindNode call I noticed that indeed sometimes nodes would be returned that were not the closest. A whole bucket could be missing even from the returned values. Turns out the idAtDistance calculation was wrong. This is fixed here: https://github.com/status-im/nim-eth/commit/99c68d40f71a3bbb6b3e4024390cc7fde7c31ecb / https://github.com/status-im/nim-eth/pull/219

Now, why do we need this idAtDistance? This too comes from the fact that we use the Kademlia implementation of our discovery v4 code.

The Kademlia paper describes an implementation that starts off from one k-bucket, and keep splitting the bucket as more nodes are discovered and added. The bucket splits only on the part of the binary tree that our own id belongs too (same prefix). Resulting eventually in k-buckets per log base 2 distance. Well, not really, as nodes with ids in the lower distance ranges will never be found. And because of this an optimisation is done where buckets will also split even if the nodes own id does not have the same prefix (to avoid highly unbalanced branches).

Now some implementations take a more simplified approach. They simply directly create buckets for each possible logarithmic distance (e.g. here 1->256). Some implementations also just don't create buckets with logarithmic distance < x, as the lower the distances goes, the less chance there is that you will find nodes for it.

Now, discoveryv4 its FindNode call would request k closest nodes. As does original Kademlia. This effectively puts the work at the node that gets the request. This node will have to check its buckets and gather the closest. Some implementations go over all the nodes in all the buckets for this (e.g. go-ethereum discv4). However, in our bucket splitting approach, this search is improved.

Now, in Discoveryv5 the approach was taken to pass the logarithmic distance instead of the NodeId as parameter for the FindNode call. And to only return nodes that match that logarithmic distance.

This was done to not put the trust at the node for selecting the closest nodes. Possibly a difference in implementation, but probably more important a security issue. See also: https://github.com/ethereum/devp2p/blob/master/discv5/discv5-rationale.md#115-guard-against-kademlia-implementation-flaws

The result is, that in an implementation which just stores buckets per logarithmic distance, it simply has to return the bucket. In our split-bucket implementation, this can't be done and we are still doing the closest search, which is also why we need the idAtDistance proc. Next we will need to filter out the nodes with invalid distances, to be compliant to the specification (This still needs to be added still). Perhaps some things can get optimised there, but, it sounds better to just switch away from the split-bucket approach. As I think that the main benefit it has is improved lookups (due to no unbalanced branches), and I believe this will be negated by limiting the returned nodes to the ones of the requested logarithmic distance for the FindNode call.

This FindNode change will also have an effect on the efficiency of the network. Work will be moved from the receiver of FindNodes to the requester. But this also means more network traffic, as less nodes will potentially be passed around, and thus more requests will be needed for a lookup (adding bandwidth and latency). This might be a concern for mobile devices.

arnetheduck commented 4 years ago

this fine description belongs in the source code ;)

kdeme commented 4 years ago

According to spec now after https://github.com/status-im/nim-eth/pull/223 + tests added.