Closed guillaumemichel closed 1 year ago
Thanks for this, I was planning to ask you to expand on your thinking here, so having this issue is really useful.
I think it would be good to work some of this into the design documentation. Currently the design has an IPFS DHT section that you could update to be more explicit about the boundaries between this repo's goals and IPFS-specific goals.
I note that peer routing is currently in that IPFS DHT section but do you agree that its a feature that is generally useful across all kad deployments?
All modules that are specific to the IPFS DHT (e.g that cannot/shouldn't be used in other DHT networks/implementations) should move to ipfs/boxo.
These modules include:
- IPFS Server mechanism, that is only handling IPFS requests (basicserver can remain in this repo for testing purposes).
- IPFSv1 message module, including the IPFS protobuf message format and helpers.
Note that consumers of the current go-libp2p-kad-dht repository, will become consumers of ipfs/boxo/kad-dht, and NOT consumers of go-kademlia directly.
As has been mentioned previously both of these are related to the libp2p DHT spec (https://github.com/libp2p/specs/tree/c733210b3a6c042d01f6b39f23c0c9a3a20d3e88/kad-dht) not to the IPFS Public DHT specifically.
For some of the things specific to the IPFS Public DHT that should probably live in boxo look at https://github.com/libp2p/go-libp2p-kad-dht/issues/597 and linked issues. It includes things like the protocol name(s), put/get validators, the network constants like k
, the record expiration times, routing table refresh intervals, etc.
@guillaumemichel IIUC moving all the components to boxo is also inconsistent with your comment in https://github.com/libp2p/go-libp2p-kad-dht/issues/846#issuecomment-1590747480, where a libp2p DHT user (who is not using the IPFS Public DHT) reasonably wants to keep using their DHT without bringing in IPFS dependencies.
If you don't want any libp2p components in this repo, then this likely means creating a barebones libp2p dht using this implementation as an alternative client/server implementation in go-libp2p-kad-dht. However, note that this means that it is likely that many PRs to modify DHT behavior will end up as multiple PRs with the associated overhead of bubbling as has been flagged previously as the cost of having a separate repo here.
The confusion between the IPFS DHT and the libp2p is expected to be addressed by the Composable DHT. Until then, we need to be very careful with naming and dependencies generally.
FIND_PEER
, PUT_PROVIDER
, GET_PROVIDERS
, PUT_VALUE
, GET_VALUE
.@aschmahmann I agree with everything you wrote. go-kademlia is a generic Kademlia implementation (genericity is required, not to build BitTorrent implementations, but to build new features such as the Composable DHT, the Double Hash DHT, and generally facilitate the improvement process of the IPFS DHT). For this reason, and as it doesn't depend on libp2p other than being a possible transport, go-kademlia should not be the libp2p DHT implementation. However the libp2p DHT implementation (e.g go-libp2p-kad-dht) should depend on go-kademlia. And finally the IPFS DHT implementation (e.g boxo) should depend on the libp2p DHT implementation.
The libp2p DHT implementation should define the IpfsDHT
(or Libp2pDHT
?) struct, the server behavior (request handling), and message format. The IPFS DHT implementation should only define parameters of the libp2p DHT implementation, such as protocol ID, bucket size, refresh interval etc. So the IPFS DHT implementation would be an instantiation of the libp2p DHT implementation, itself depending on go-kademlia for the Kademlia routing logic.
You are right that we should pay attention to https://github.com/libp2p/go-libp2p-kad-dht/issues/846, but I doubt we will be able to tackle the weird dependency chain (libp2p DHT network -> IPFS DHT network -> IPFS DHT implementation -> libp2p DHT implementation) before the Composable DHT.
However, note that this means that it is likely that many PRs to modify DHT behavior will end up as multiple PRs with the associated overhead of bubbling as has been flagged previously as the cost of having a separate repo here.
Yes, it is indeed not ideal. go-kademlia's goal is to solve the Kademlia routing, and to expose a simple interface to its consumers. This interface is simple and generic, allowing the caller to control some parts of the behavior, or directly implementing its modules implementing the defined interfaces in the same repo. The Kademlia routing interface is not expected to change in the future, so once the repo is functional, its interfaces are not expected to change. The next potential change would come with the Composable DHT (if go-kademlia is transformed to be the new Composable DHT implementation). Alternatively, the Composable DHT could be another repository depending on go-kademlia.
Alternatively, if we don't want to have 3 DHT repos (go-kademlia, go-libp2p-kad-dht and boxo/kad-dht), we could merge the libp2p DHT implementation with go-kademlia. One module of go-kademlia could the the libp2p DHT implementation. We could add more implementations, for instance a simulation implementation that we are using to test the protocol, and example implementations showing how to make use of the go-kademlia repo. So the libp2p DHT implementation would be an example of how to use go-kademlia.
Split functionality over 3 repos:
The focus of go-kademlia remains a generic Kademlia toolkit that can be configured for use by different networks. It provides a more maintainable, better performing and extensible foundation for new ideas like the composable DHT.
Keep go-libp2p-kad-dht as the home of the libp2p dht, but refactor it to be built in terms of go-kademlia
. The result of this refactoring becomes version 2 (go-libp2p-kad-dht/v2
)
Create a package in boxo (for example: routing/dht
) that contains the configuration of the libp2p dht for IPFS.
Outcome:
boxo
go-libp2p-kad-dht/v2
directly with new parametersv2-develop
branch in go-libp2p-kad-dht
to track the refactor, eventually to become v2
go-libp2p-kad-dht/IpfsDHT
to KadDHT
in v2-develop
KadDHT
configurable with protocol name, bootstrappers, validators, network constants, expiration times, routing table refresh intervals etc.go-libp2p-kad-dht/v2
go-kademlia
to remove dependency on go-libp2p
. Functional dependencies such as Libp2pEndpoint
move to go-libp2p-kad-dht/v2
whereas constant/type dependencies like Connectedness
are replaced with local equivalents.go-kademlia
focuses on kademlia algorithm implementation, searches, event queue management, peer routingDHT
type in boxo
that instantiates KadDHT
with IPFS specific options. This is low configuration with sensible defaults. Applications requiring more control can use KadDHT
directly.@iand it makes a lot of sense to me!
A few minor remarks:
KadDHT
module, and Libp2pEndpoint
, server, message format, etc. in go-kademlia
(e.g in the example
folder while we are actively working on them, as the interfaces may slightly change during the development. If we already split the code, we would have to do one PR in each repo when updating an interface. Once we are happy with the KadDHT
implementation, we can move it and its associated modules to go-libp2p-kad-dht/v2
.go-kademlia
can be useful, especially if they are generic enough. For instance Libp2pEndpoint
seems generic enough and could be used in Kademlia implementations other than KadDHT
as a message endpoint. And generally, I think it is good to have examples for how to implement an interface in the same repo.KadDHT
could also be named Libp2pDHT
Agree that prototyping in the example folder can make sense (although practically speaking go.work files make cross-module development trivial)
Libp2pDHT
seems redundant to me since it's in a libp2p repository/module. https://github.com/libp2p/go-libp2p-kad-dht/issues/337 suggests naming it Kad
. I think KadDHT
makes it clear that it's a Kademlia DHT rather than something like Chord :smile:
Closing as resolved
All modules that are specific to the IPFS DHT (e.g that cannot/shouldn't be used in other DHT networks/implementations) should move to ipfs/boxo.
These modules include:
basicserver
]() can remain in this repo for testing purposes).The
IpfsDHT
struct
should be defined directly in ipfs/boxo. This includes instantiating a newLibp2pEndpoint
, building aRoutingTable
, defining the server's behavior and the message format, interacting with the query mechanism (to decide when each of the queries should terminate). IPFS constants (e.g bucket size, number of closer peers to return, IPFS DHT protocol ID, etc.) should be defined directly in ipfs/boxo.Note that consumers of the current go-libp2p-kad-dht repository, will become consumers of ipfs/boxo/kad-dht, and NOT consumers of go-kademlia directly.
What should stay in go-kademlia:
FullRT
,ClientRT
,LazyRT
etc.) because even though they are built to serve in the IPFS DHT, they are generic components that could be used in other Kademlia implementations.The goal of this separation is to get the ground ready for the Composable DHT.