probe-lab / go-kademlia

Generic Go Kademlia implementation
Other
17 stars 4 forks source link

Move IPFS/libp2p specific components to ipfs/boxo and/or libp2p/go-libp2p-kad-dht #34

Closed guillaumemichel closed 1 year ago

guillaumemichel commented 1 year ago

All modules that are specific to the IPFS DHT (e.g that cannot/shouldn't be used in other DHT networks/implementations) should move to ipfs/boxo.

These modules include:

The IpfsDHT struct should be defined directly in ipfs/boxo. This includes instantiating a new Libp2pEndpoint, building a RoutingTable, defining the server's behavior and the message format, interacting with the query mechanism (to decide when each of the queries should terminate). IPFS constants (e.g bucket size, number of closer peers to return, IPFS DHT protocol ID, etc.) should be defined directly in ipfs/boxo.

Note that consumers of the current go-libp2p-kad-dht repository, will become consumers of ipfs/boxo/kad-dht, and NOT consumers of go-kademlia directly.

What should stay in go-kademlia:

The goal of this separation is to get the ground ready for the Composable DHT.

iand commented 1 year ago

Thanks for this, I was planning to ask you to expand on your thinking here, so having this issue is really useful.

I think it would be good to work some of this into the design documentation. Currently the design has an IPFS DHT section that you could update to be more explicit about the boundaries between this repo's goals and IPFS-specific goals.

I note that peer routing is currently in that IPFS DHT section but do you agree that its a feature that is generally useful across all kad deployments?

aschmahmann commented 1 year ago

All modules that are specific to the IPFS DHT (e.g that cannot/shouldn't be used in other DHT networks/implementations) should move to ipfs/boxo.

These modules include:

  • IPFS Server mechanism, that is only handling IPFS requests (basicserver can remain in this repo for testing purposes).
  • IPFSv1 message module, including the IPFS protobuf message format and helpers.

Note that consumers of the current go-libp2p-kad-dht repository, will become consumers of ipfs/boxo/kad-dht, and NOT consumers of go-kademlia directly.

As has been mentioned previously both of these are related to the libp2p DHT spec (https://github.com/libp2p/specs/tree/c733210b3a6c042d01f6b39f23c0c9a3a20d3e88/kad-dht) not to the IPFS Public DHT specifically.

For some of the things specific to the IPFS Public DHT that should probably live in boxo look at https://github.com/libp2p/go-libp2p-kad-dht/issues/597 and linked issues. It includes things like the protocol name(s), put/get validators, the network constants like k, the record expiration times, routing table refresh intervals, etc.

@guillaumemichel IIUC moving all the components to boxo is also inconsistent with your comment in https://github.com/libp2p/go-libp2p-kad-dht/issues/846#issuecomment-1590747480, where a libp2p DHT user (who is not using the IPFS Public DHT) reasonably wants to keep using their DHT without bringing in IPFS dependencies.

If you don't want any libp2p components in this repo, then this likely means creating a barebones libp2p dht using this implementation as an alternative client/server implementation in go-libp2p-kad-dht. However, note that this means that it is likely that many PRs to modify DHT behavior will end up as multiple PRs with the associated overhead of bubbling as has been flagged previously as the cost of having a separate repo here.

guillaumemichel commented 1 year ago

The confusion between the IPFS DHT and the libp2p is expected to be addressed by the Composable DHT. Until then, we need to be very careful with naming and dependencies generally.

@aschmahmann I agree with everything you wrote. go-kademlia is a generic Kademlia implementation (genericity is required, not to build BitTorrent implementations, but to build new features such as the Composable DHT, the Double Hash DHT, and generally facilitate the improvement process of the IPFS DHT). For this reason, and as it doesn't depend on libp2p other than being a possible transport, go-kademlia should not be the libp2p DHT implementation. However the libp2p DHT implementation (e.g go-libp2p-kad-dht) should depend on go-kademlia. And finally the IPFS DHT implementation (e.g boxo) should depend on the libp2p DHT implementation.

The libp2p DHT implementation should define the IpfsDHT (or Libp2pDHT?) struct, the server behavior (request handling), and message format. The IPFS DHT implementation should only define parameters of the libp2p DHT implementation, such as protocol ID, bucket size, refresh interval etc. So the IPFS DHT implementation would be an instantiation of the libp2p DHT implementation, itself depending on go-kademlia for the Kademlia routing logic.

You are right that we should pay attention to https://github.com/libp2p/go-libp2p-kad-dht/issues/846, but I doubt we will be able to tackle the weird dependency chain (libp2p DHT network -> IPFS DHT network -> IPFS DHT implementation -> libp2p DHT implementation) before the Composable DHT.

However, note that this means that it is likely that many PRs to modify DHT behavior will end up as multiple PRs with the associated overhead of bubbling as has been flagged previously as the cost of having a separate repo here.

Yes, it is indeed not ideal. go-kademlia's goal is to solve the Kademlia routing, and to expose a simple interface to its consumers. This interface is simple and generic, allowing the caller to control some parts of the behavior, or directly implementing its modules implementing the defined interfaces in the same repo. The Kademlia routing interface is not expected to change in the future, so once the repo is functional, its interfaces are not expected to change. The next potential change would come with the Composable DHT (if go-kademlia is transformed to be the new Composable DHT implementation). Alternatively, the Composable DHT could be another repository depending on go-kademlia.

Alternatively, if we don't want to have 3 DHT repos (go-kademlia, go-libp2p-kad-dht and boxo/kad-dht), we could merge the libp2p DHT implementation with go-kademlia. One module of go-kademlia could the the libp2p DHT implementation. We could add more implementations, for instance a simulation implementation that we are using to test the protocol, and example implementations showing how to make use of the go-kademlia repo. So the libp2p DHT implementation would be an example of how to use go-kademlia.

iand commented 1 year ago

Proposal

Split functionality over 3 repos:

Outcome:

Tasks

guillaumemichel commented 1 year ago

@iand it makes a lot of sense to me!

A few minor remarks:

iand commented 1 year ago

Agree that prototyping in the example folder can make sense (although practically speaking go.work files make cross-module development trivial)

Libp2pDHT seems redundant to me since it's in a libp2p repository/module. https://github.com/libp2p/go-libp2p-kad-dht/issues/337 suggests naming it Kad. I think KadDHT makes it clear that it's a Kademlia DHT rather than something like Chord :smile:

iand commented 1 year ago

Closing as resolved