libp2p / notes

libp2p Collaborative Notebook for Research
MIT License
37 stars 4 forks source link

DHT improvement ideas #21

Open jhiesey opened 6 years ago

jhiesey commented 6 years ago

Although I don't see any single magic solution to making a secure and performant DHT, there are a lot of good ideas out there beyond what is implemented in libp2p so far. Here are some, divided into security and performance sections.

The "really should try" sections are based on examining the js implementation; I haven't examined the go version to see if it's different.

Security

The most serious security concern with a DHT is the Sybil attack, where an attacker controls a large number of DHT nodes at once. There are a few things this lets the attacker accomplish:

Preventing Sybil attacks is one of the major areas of DHT research in the last decade.

There are also other ways to attack a DHT; for example, by flooding nodes with put or provide messages that are incorrect. From talking to @whyrusleeping it sounds like verifying signatures is in place for put, but bad provides aren't verified. I haven't investigated solutions to this.

Ideas we're already using

Ideas we really should try

Ideas to consider

Ideas we probably shouldn't use

There are a lot of ideas in the literature that aren't currently applicable to libp2p but are worth knowing about.

Performance

Ideas we really should try

Ideas to consider

Ideas we probably shouldn't use

Other thoughts

It would be very helpful to be able to test on real, live networks. iptb is a nice start, but it doesn't realistically simulate network connectivity, latency, or churn.

Could we allow libp2p nodes to connect to a command-and-control server to allow running test queries, either on an opt-in or opt-out basis? This would give us far more insight into what's really happening.

References

1

Petar Maymounkov and David Mazieres. "Kademlia: A Peer-to-peer Information System Based on the XOR Metric"

2

Ingmar Baumgart and Sebastian Mies. "S/Kademlia: A Practivable Approach Towards Secure Key-Based Routing"

3

Michael J. Freedman and David Mazieres. "Sloppy hashing and self-organizing clusters"

4

Chris Lesniewski-Laas and M. Frans Kaashoek. "Whanau: A Sybil-proof Distributed Hash Table"

5

Xiang Zuo and Adriana Iamnitchi. "A Survey of Socially Aware Peer-to-Peer Systems"

6

Frank Li, Prateek Mittal, Matthew Caesar, and Nikita Borisov. "SybilControl: Practical Sybil Defense with Computational Puzzles"

7

M. Frans Kaashoek and David R. Karger. "Koorde: A simple degree-optimal distributed hash table"

8

Diksha Gupta, Jared Saia, and Maxwell Young. "Proof of Work Without All the Work: Computationally Efficient Attack-Resistant Systems"

raulk commented 6 years ago

The routing table prefers to keep the oldest nodes over recently added ones. This makes it hard for an attacker to quickly take over the network, but can be subverted if an attacker is sufficently patient. This is part of the original Kademlia paper[1].

Currently this isn't the case in go-libp2p. When we encounter a new node, we evict the least recently seen one from the bucket. The Kademlia paper recommends PINGing the eviction candidate and only replacing it if it's unresponsive, but we don't do that currently AFAIK.