Open miyazono opened 5 years ago
Thanks @aschmahmann. The similarities are definitely pretty clear, and having the three systems described like that is incredibly helpful.
From a brainstorming standpoint, it seems like we're only constrained by the fact that we don't influence where the records are. Using a Kademlia-based DHT for provider records, we've picked some point on the time/space tradeoff where nodes each store less of the database resulting in more hops to resolve the request. I'm interested to learn how we do IPNS and PeerID resolution (do we use a similar DHT?) but I'm happy to read wherever that's documented.
The IPNS freshness issues seems interesting, and I'd assume there's lots of thought and active research on that, since the problem seems pretty easily generalizable.
I'm assuming that common paths for solving resolution time/space tradeoff problems are
@miyazono we currently use the same DHT for everything and use namespaces (/ipns, /pk, /ipfs) to differentiate the request types. However, there's some thought that maybe we should be breaking the DHT up based on request type or other protocol parameters. There's some thoughts about these DHT related questions at libp2p/notes#10, ipfs/notes#291, libp2p/research-dht#6 (and likely some other places I'm forgetting right now).
You're absolutely right that we're trading off spreading the data around the DHT for additional hops. My understanding is that two big issues with our Kademlia DHT are:
Kademlia fundamentally relies on the triangle inequality and that because the XOR metric (XOR(A,B) interpreted as an integer) satisfies the triangle inequality we can use the XOR metric between peerIDs to route our way towards our data.
Unfortunately, because of asymmetric connections (NATs, firewalls, etc.) the triangle inequality doesn't actually hold on our networks. For example, say Alice, Bob, Charlie, and Dan have peerIDs (Alice, 0000), (Bob, 0011), (Charlie, 0111), (Dan, 1111). Also say that Alice, Bob and Charlie are all connected to each other, but Dan is only connected to Bob. If Alice tries to find data stored at 1111 she will ask Charlie since he is the closer than Bob. However, because Charlie isn't connected to Dan Alice won't find him.
As a result, all users on asymmetric connections are effectively adversarial users. Hopefully this should be helped by properly implementing S/Kademlia which will help us better deal with adversarial nodes (see libp2p/go-libp2p-kad-dht#146 and libp2p/go-libp2p-kad-dht#204). However, if instead of a fraction of nodes being adversarial we have many adversarial nodes on the network (e.g. NAT = adversary) then we're going to need to rely on making those nodes no longer problematic. This could include leveraging the newly implemented autorelay system, adding systems to separate out local peers from global peers, etc.
The DHT is global and used for various sorts of data and protocols. Even for a single data type there's some balance between the number of nodes in the network, amount of times data is place in the network, the size of routing tables on individual nodes and lookup time. There are tradeoffs to be made here, we can't win on every front (e.g. 1B nodes, data is placed in the network twice, 1KB routing table, 1 round trip to find data). The DHT issues I linked above have some people's thoughts on the matter.
mark
From a discussion that started with IPNS name resolving @aschmahmann
@miyazono
@aschmahmann