probe-lab / go-kademlia

Generic Go Kademlia implementation
Other
17 stars 4 forks source link

Reorient toward a general purpose DHT foundation library #84

Open iand opened 11 months ago

iand commented 11 months ago

This issue arises from some recent discussions with Juan about the scope and direction of go-kademlia and go-libp2p-kad-dht.

Currently, our focus is on improving our Kademlia implementation in order to implement protocol enhancements. But there's value in creating a space for experimenting with different DHT concepts. Focusing on a platform that allows experimentation with various DHT properties and algorithms might better serve us in the long run. Our goal should be to address today's needs while also preparing for future challenges.

Our discussions often revolve around the technicalities of implementation but many in our community are keen on seeing real-world benefits. They're interested in mixing different membership styles, the idea of multi-level DHTs, and the ability to easily introduce varied methods and record types. As we think about tweaks to the IPFS DHT, it's crucial to align with community feedback. Their input should guide our path forward.

Instead of honing in on just one DHT improvement, we should consider building a foundation for broader DHT experimentation, allowing us to test, iterate, and optimise a variety of DHT structures and algorithms. A versatile DHT platform needs a modular design which should clearly define and allow modifications to key attributes such as distance metrics, routing algorithms, and communication protocols, enabling integration of new features or changes.

However, Kademlia still remains central to IPFS for the near term. We need to dedicate resources to its refinement, ensuring that its implementation is as robust and efficient as possible, while we determine how best to make room for other designs or their features in a new DHT foundation.

guillaumemichel commented 10 months ago

I totally agree that this repo should reorient toward a general tooling library helping building any kind of DHT.

After thinking about it, I think that we aren't far from it.

If someone wanted to implement Chord, they would just have to build the Chord routing table, and query mechanism. They could already use the other components (e.g scheduler, endpoint, key, sim etc.). If someone wants to build a DSHT, R5N or custom kademlia-geometry based DHT, it would be the same process: build the appropriate routing table and query mechanism, all the other components can be reused.

Most other DHT geometries, such as the ones in use in Chord, Pastry, Tapestry etc. also use binary keys as identifiers. Some newer DHT use exotic geometries, that we couldn't implement at the moment using this repo. We could make the DHT identifier Id (currently Key type) even more generic (basically any), and Key would implement Id. This would allow to support basically every geometry.

Maybe we can add a package to compose DHTs, but it may not be required as combining routing table memberships should happen in the routing table implementation.

I think the first important move we need to take, is to rename the repo to remove Kademlia from it, because we can build any kind of DHT, and the current name is misleading. After the repo is renamed, we need to make sure that the routing table and query implementations that we already have contain a kad in their package name, to indicate that they are specific to kademlia.

Potential names for the repo could be:

yiannisbot commented 10 months ago

The ProbeLab team had a meeting today to discuss the outcome of the discussion with @jbenet and identify next steps.

The TL;DR is that we think we're not far away at all from the vision that @jbenet laid out (as @guillaumemichel points out above), although the devil is in the details. It also seems that the refactoring work (as well as the Reprovide Sweep) are not affected by the discussion, so this line of work continues as scheduled.

We gathered the following Action Items (in this order):

guillaumemichel commented 10 months ago

Thinking about it again.

The Routing Table interface is generic enough to be used with any geometry. We would just need to replace the Key dependency with a generic Point dependency (allowing other formats than just bitstring). Also the generic Query mechanism only using the RoutingTable interface works for all geometries and routing table implementations.

https://github.com/plprobelab/go-kademlia/blob/44238be8a1f611e0415e7b4c03cc04f5e833cb07/kad/kad.go#L41-L71

So it is possible to have a generic query implementation, only making use of the methods defined by the RoutingTable interface, and converging any Point within the geometry used by any custom routing table implementation. Note however that the generic query may not be the most optimized query mechanism to converge to a Point in a specific geometry. A Routing Table based on a custom geometry can expose additional methods, that could be used by a custom Query mechanism optimized for the routing table.

This generic RoutingTable interface doesn't prevent routing table implementers from implementing specific methods optimized for their use.

BigLep commented 10 months ago

I'm just ACKing that I saw the pings here and have read the above and the content being generated in https://www.notion.so/pl-strflt/2023-08-07-IPFS-Tech-Sync-With-Juan-4061193911544b9fa864a663f8f1ee84?pvs=4#4bda867c13074a029a39150b6539757a so far.

I'll respond more tomorrow with a fresh head, but a few quick things:

  1. We have 30 minutes with Juan on 2023-08-31. I'll see if we can expand it.
  2. Agreed we should rename the package if it's not kademlia specific. Is there a frontrunner new name? I like go-libdht or go-dht-forge. But yeah, lets get a new name to avoid confusion.
  3. I don't want to preclude future options, but I also don't want to get caught doing too much abstraction. I don't know where to draw the line. The focus of doing the refactoring so we can have really solid test coverage for the reprovide sweep work seems reasonable.
  4. My guess is that scoping the public comms down to committing us to the refactor work and "amino" naming, while hinting at things we'd like to do down the line like composable DHT and reader privacy makes sense. I think to move things forward @yiannisbot , you could put together an outline for what you want to cover with Juan and bonus (but lower priority) an updated blog entry. That could be something quick to review that you and I could align on.
  5. Generally statement to hold me accountable: I want to make sure I'm not getting into too much of the weeds here not to become a blocker because I know I won't be able to devote sustained attention here right now in this season.

Thanks a lot for all your work here guys!

BigLep commented 10 months ago

With fresh head, I still agree with my comments in https://github.com/plprobelab/go-kademlia/issues/84#issuecomment-1691060696

One other thing:

We need to identify what the extra effort is to make go-kademlia into something more generic, as per Juan's request and @guillaumemichel's note above. For instance, in order to check if other routing algorithms (e.g., Chord) are compatible, we’d need to implement and test at least one of them, so that we know it will work for others. This is a sizeable extra item, which could add several weeks not accounted for right now.

I generally want to avoid weeks of extra work right now that prevent us from providing user value (like the reprovide sweep).

Are the options:

  1. Don't be as generic (and thus not incur extra time)
  2. Be generic, but the team only feels good about being generic if they implement a second use case (which incurs weeks of work)? I assume we don't feel comfortable being generic but not validating it with a second use case because the worry is that we are adding unnecessary abstraction. Is that right? That is understandable. If we don't go generic now, do we feel like that is something we can come do later easily enough?
yiannisbot commented 10 months ago

I agree we have to draw the line carefully and not get into too much abstraction, especially given the imminent org restructuring. We should avoid committing to a very generic design and development as this will be a long-term commitment.

Generally statement to hold me accountable: I want to make sure I'm not getting into too much of the weeds here not to become a blocker because I know I won't be able to devote sustained attention here right now in this season.

@BigLep I reckon you be present when we make a decision on how we carry forward, as this will influence quite a few things down the line.

yiannisbot commented 10 months ago

I'll think more about it and put in the doc that I'll prepare for the discussion, but some first thoughts to address the tradeoff between realism and usefulness to the ecosystem is to:

BigLep commented 10 months ago

Thanks @yiannisbot and @guillaumemichel for the updates and docs. Great work!

@BigLep I reckon you be present when we make a decision on how we carry forward, as this will influence quite a few things down the line.

Yes, I'll be present for decision making and make sure I'm aware of the decision and the implications.

The timeline and suggested actions makes sense to me. For Reprovide Sweep that we are anchoring our refactor work to and proposing to complete to get ROI, it would be great if we can also provide user anecdotes are will readily adopt this feature when it's available and ideally are excited to see it be released.

Good stuff - thanks all.