Open iand opened 1 year ago
I totally agree that this repo should reorient toward a general tooling library helping building any kind of DHT.
After thinking about it, I think that we aren't far from it.
If someone wanted to implement Chord, they would just have to build the Chord routing table, and query mechanism. They could already use the other components (e.g scheduler
, endpoint
, key
, sim
etc.). If someone wants to build a DSHT, R5N or custom kademlia-geometry based DHT, it would be the same process: build the appropriate routing table and query mechanism, all the other components can be reused.
Most other DHT geometries, such as the ones in use in Chord, Pastry, Tapestry etc. also use binary keys as identifiers. Some newer DHT use exotic geometries, that we couldn't implement at the moment using this repo. We could make the DHT identifier Id
(currently Key
type) even more generic (basically any
), and Key
would implement Id
. This would allow to support basically every geometry.
Maybe we can add a package to compose DHTs, but it may not be required as combining routing table memberships should happen in the routing table implementation.
I think the first important move we need to take, is to rename the repo to remove Kademlia from it, because we can build any kind of DHT, and the current name is misleading. After the repo is renamed, we need to make sure that the routing table and query implementations that we already have contain a kad
in their package name, to indicate that they are specific to kademlia.
Potential names for the repo could be:
The ProbeLab team had a meeting today to discuss the outcome of the discussion with @jbenet and identify next steps.
The TL;DR is that we think we're not far away at all from the vision that @jbenet laid out (as @guillaumemichel points out above), although the devil is in the details. It also seems that the refactoring work (as well as the Reprovide Sweep) are not affected by the discussion, so this line of work continues as scheduled.
We gathered the following Action Items (in this order):
go-kademlia
into something more generic, as per Juan's request and @guillaumemichel's note above. For instance, in order to check if other routing algorithms (e.g., Chord) are compatible, we’d need to implement and test at least one of them, so that we know it will work for others. This is a sizeable extra item, which could add several weeks not accounted for right now.
Thinking about it again.
The Routing Table interface is generic enough to be used with any geometry. We would just need to replace the Key
dependency with a generic Point
dependency (allowing other formats than just bitstring). Also the generic Query
mechanism only using the RoutingTable
interface works for all geometries and routing table implementations.
Self
will always return the Point
representing the location of the current node inside the geometry.AddNode
is required for one need to be able to add nodes to a routing table. The routing table implementation then derives the associated Point
from the provided Node
.RemovePoint
/ RemoveNode
is also required, because whatever geometry is being use, one need to be able to remove peers from a routing table. Nodes are indexed by their associated Point
in the routing table, hence providing the Node
isn't required, providing the point is enough to find and remove a Node
from the routing table. RemoveNode
is also generic, and requires the routing table to compute the associated Point
given the provided Node
.NearestNodes
is the most interesting, and geometry depending method. However, it can be used by all DHT implementations, because a key requirement of a DHT is to use a geometry along with a distance definition within this geometry. Without distance metric, it isn't possible to converge efficiently towards the target Point
. Hence, whatever the geometry, it will always be possible to sort Point
s from the closest to the furthest from a reference Point
in any geometry (taking the minimal distance between two Point
s for instance. Note that it is possible that many Point
s are equidistant to a reference Point
. So the NearestNodes
method is always required, and will always be usable by a generic Query
/Lookup
mechanism, that is querying the _Nearest known peers, and then the Nearest nodes returned by queried remote nodes until it converges to the desired Point
.So it is possible to have a generic query implementation, only making use of the methods defined by the RoutingTable
interface, and converging any Point
within the geometry used by any custom routing table implementation. Note however that the generic query may not be the most optimized query mechanism to converge to a Point
in a specific geometry. A Routing Table based on a custom geometry can expose additional methods, that could be used by a custom Query
mechanism optimized for the routing table.
This generic RoutingTable
interface doesn't prevent routing table implementers from implementing specific methods optimized for their use.
I'm just ACKing that I saw the pings here and have read the above and the content being generated in https://www.notion.so/pl-strflt/2023-08-07-IPFS-Tech-Sync-With-Juan-4061193911544b9fa864a663f8f1ee84?pvs=4#4bda867c13074a029a39150b6539757a so far.
I'll respond more tomorrow with a fresh head, but a few quick things:
Thanks a lot for all your work here guys!
With fresh head, I still agree with my comments in https://github.com/plprobelab/go-kademlia/issues/84#issuecomment-1691060696
One other thing:
We need to identify what the extra effort is to make go-kademlia into something more generic, as per Juan's request and @guillaumemichel's note above. For instance, in order to check if other routing algorithms (e.g., Chord) are compatible, we’d need to implement and test at least one of them, so that we know it will work for others. This is a sizeable extra item, which could add several weeks not accounted for right now.
I generally want to avoid weeks of extra work right now that prevent us from providing user value (like the reprovide sweep).
Are the options:
I agree we have to draw the line carefully and not get into too much abstraction, especially given the imminent org restructuring. We should avoid committing to a very generic design and development as this will be a long-term commitment.
Generally statement to hold me accountable: I want to make sure I'm not getting into too much of the weeds here not to become a blocker because I know I won't be able to devote sustained attention here right now in this season.
@BigLep I reckon you be present when we make a decision on how we carry forward, as this will influence quite a few things down the line.
I'll think more about it and put in the doc that I'll prepare for the discussion, but some first thoughts to address the tradeoff between realism and usefulness to the ecosystem is to:
Thanks @yiannisbot and @guillaumemichel for the updates and docs. Great work!
@BigLep I reckon you be present when we make a decision on how we carry forward, as this will influence quite a few things down the line.
Yes, I'll be present for decision making and make sure I'm aware of the decision and the implications.
The timeline and suggested actions makes sense to me. For Reprovide Sweep that we are anchoring our refactor work to and proposing to complete to get ROI, it would be great if we can also provide user anecdotes are will readily adopt this feature when it's available and ideally are excited to see it be released.
Good stuff - thanks all.
This issue arises from some recent discussions with Juan about the scope and direction of go-kademlia and go-libp2p-kad-dht.
Currently, our focus is on improving our Kademlia implementation in order to implement protocol enhancements. But there's value in creating a space for experimenting with different DHT concepts. Focusing on a platform that allows experimentation with various DHT properties and algorithms might better serve us in the long run. Our goal should be to address today's needs while also preparing for future challenges.
Our discussions often revolve around the technicalities of implementation but many in our community are keen on seeing real-world benefits. They're interested in mixing different membership styles, the idea of multi-level DHTs, and the ability to easily introduce varied methods and record types. As we think about tweaks to the IPFS DHT, it's crucial to align with community feedback. Their input should guide our path forward.
Instead of honing in on just one DHT improvement, we should consider building a foundation for broader DHT experimentation, allowing us to test, iterate, and optimise a variety of DHT structures and algorithms. A versatile DHT platform needs a modular design which should clearly define and allow modifications to key attributes such as distance metrics, routing algorithms, and communication protocols, enabling integration of new features or changes.
However, Kademlia still remains central to IPFS for the near term. We need to dedicate resources to its refinement, ensuring that its implementation is as robust and efficient as possible, while we determine how best to make room for other designs or their features in a new DHT foundation.