Open tomaka opened 6 years ago
This should be solved by adding more randomness to the process.
Is this required for Polkadot mainnet (i.e. is it a viable attack) or just an idea for making things more secure?
cc'ing @romanb @mxinden who probably have a better knowledge than me on Kademlia's vulnerabilities
Performing an eclipse attack on the peer discovery this way would probably take ages, I think? And the more nodes we have in the network, the more robust we are.
Maybe a bit more problematic is the authority discovery system. As we store the IP addresses of validator nodes in the DHT, someone could eclipse an authority by forging node IDs that are close this this public key, and prevent nodes from knowing the IP of a validator. But since validators establish outgoing connections, they would still be reachable through gossiping.
It is possible for an attacker to create in advance lots of node IDs
There are a couple suggestions in the literature to make the generation of lots of node identifiers hard:
Introduce a static crypto puzzle, e.g. requiring a certain amount of zeroes in every node identifier. [1] Only a defense against small-time-span attacks.
Introduce a dymanic crypto puzzle on request [2] or periodically based on some globally known random variable. Requires coordination.
Supervise node identifiers through e.g. on-chain matching of blockchain identity and network identity. Introduces a chicken-and-egg issue.
Instead I think a better and less intrusive bet would be to introduce the multi-disjoint-path lookup described in S/Kademlia [1] and [3] to our Kademlia implementation (//CC @romanb).
As we store the IP addresses of validator nodes in the DHT, someone could eclipse an authority by forging node IDs that are close this this public key, and prevent nodes from knowing the IP of a validator.
In the long run I would like to put the network addresses of a validator both on the DHT as well as on-chain. The former for bootstrapping, the latter for ongoing trusted operation.
[1] Baumgart, Ingmar, and Sebastian Mies. "S/kademlia: A practicable approach towards secure key-based routing." 2007 International Conference on Parallel and Distributed Systems. IEEE, 2007.
[2] Urdaneta, Guido, Guillaume Pierre, and Maarten Van Steen. "A survey of DHT security techniques." ACM Computing Surveys (CSUR) 43.2 (2011): 8.
Are there any schemes that give DHT entries a priority? I suspect prioritization might fit our use case well, at least in combination with the multi-path thing.
Are there any schemes that give DHT entries a priority?
Not that I am aware of @burdges.
A similar discussion happened within libp2p allowing upper layer logic to influence the kbuckets (the nodes Kademlia connects to). In a similar fashion one could have a higher layer decide on priorities for DHT entries.
While this is doable, trying to force these properties onto a distributed hash table might introduce a lot more complexity then putting data onto the chain that relies on these properties.
Instead I think a better and less intrusive bet would be to introduce the multi-disjoint-path lookup described in S/Kademlia [1] and [3] to our Kademlia implementation (//CC @romanb).
I agree that this should be done, however the way this is described in the S-Kademlia paper is confusing and this seems to have resulted in Protocol Labs not implementing it yet - see libp2p/go-libp2p-kad-dht#146 which is still open, after some unfinished discussion on what is actually supposed to be the correct behaviour.
Let me try to explain it better, and I really disagree with the phrasing "disjoint-path lookup". Instead, it is more appropriate to call S-Kademlia's proposal "disjoint first-hop lookup".
In original Kademlia, roughly speaking, you select alpha (e.g. 3) nodes closest to the target, wait for them to respond, then out of these merged results, you select the alpha nodes closest to the target again and repeat until you hit the target.
Because we merge results from multiple peers, and select the closest alpha number of them, it is trivial for any peer to give you false results that "look closest" to the target but in reality are controlled by the attacker.
What S-Kademlia suggests is simply to instead ensure that the second-hop queries are chosen from the results from different first-hop peers, which means that we cannot merge them. This only applies to the first hop and the S-Kademlia paper acknowledges thus, ".. From there on the node continues with d parallel-lookups similar to the traditional Kademlia lookup", and these paths are themselves expected to converge (to the target node) assuming Kademlia works in the first place. Note that it is not meaningful to keep this defence going past the first hop, since the first hop is able to entirely control the rest of your query and you have to assume that some of them are honest.
Is this required for Polkadot mainnet (i.e. is it a viable attack)
The attack that this disjoint-first-hop defence protects against is trivial to do with the original kademlia algorithm and allows one attacker node M to completely DoS everyone (call them victims V) that tries to talk to M, regardless of how many other honest nodes V is connected to.
It is possible for an attacker to create in advance lots of node IDs close to each other, and wait for a node to randomly query a close peer ID. Its Kademlia request and result will then be flooded with all the generated IDs controlled by that single attacker.