Decide code organisation between repos

iand commented 10 months ago

Our thinking on the organisation of code has changed over the past few weeks as we have implemented more and had wider discussions about the future of DHT development. We also want to create go-libdht for the future work.

We need to make a decision on the organisation of code, because the current situation is getting confusing.

Here are some options

Keep things as they are. Refactored DHT in go-libp2p-kad-dht/v2. Keep new types and state machines in go-kademlia. Create new go-libdht repo for thinking beyond Kademlia, simulations etc..
Simplify A. Refactored DHT in go-libp2p-kad-dht/v2. Also move Kademlia types (kad + key packages) and state machines to go-libp2p-kad-dht/v2. Create new go-libdht repo for thinking beyond Kademlia, simulations etc.
Simplify B. Move refactored DHT to go-kademlia. Keep Kademlia types (kad + key packages) and state machines in go-kademlia. Create new go-libdht repo for thinking beyond Kademlia, simulations etc.
Remove go-kademlia A. Refactored DHT in go-libp2p-kad-dht/v2. Move Kademlia types (kad + key packages) and state machines to new go-libdht repo with simulations etc.
Remove go-kademlia B. Refactored DHT in go-libp2p-kad-dht/v2. Move state machines to go-libp2p-kad-dht/v2 since they are for a specific implementation. Kademlia types (kad + key packages) go to new go-libdht repo with simulations etc.

We would end up with the following repos in each option:

go-libp2p-kad-dht/v2, go-kademlia and go-libdht
go-libp2p-kad-dht/v2 and go-libdht
go-kademlia and go-libdht
go-libp2p-kad-dht/v2 and go-libdht
go-libp2p-kad-dht/v2 and go-libdht

iand commented 10 months ago

@guillaumemichel @dennis-tra what do you think?

guillaumemichel commented 10 months ago

I have a preference for 4. or 5. IMO we should have a single generic DHT repo (go-libdht), and a single DHT implementation (go-libp2p-kad-dht/v2).

If we can make the state machine generic (not dependent on kademlia logic, libp2p {peerid, addresses, protocolid,...} nor message format), it would be great to add the state machine to go-libdht, so that other implementations can also use it (4.).

However, if making the state machine generic isn't practical, or takes too much effort for now, we can keep it on the go-libp2p-kad-dht/v2 side, so that we are able to ship it faster (5.). We can always move/add the state machine module to go-libdht later.

EDIT: IMO the new go-libdht repo should belong to the plprobelab Github org, and not libp2p nor ipfs because it goes beyond the libp2p scope.

dennis-tra commented 10 months ago

+1 for what @guillaumemichel says! For now, let's go with 5 to move faster for now.

We could consider to eventually also rename go-libp2p-kad-dht/v2 to go-libp2p-kademlia (we know that kademlia is a DHT, so kad-dht is a little redundant - and it's easier to pronounce).

guillaumemichel commented 10 months ago

We can also keep the Router definition on the go-libp2p-kademlia for now, and maybe later iterate on it to make generic and move it to go-libdht. So we don't need to migrate libp2p modules from go-kademlia to go-libdht for now.

iand commented 10 months ago

We could consider to eventually also rename go-libp2p-kad-dht/v2 to go-libp2p-kademlia

go-kademlia would be available

iand commented 10 months ago

We also have 100+ issues in go-kademlia and some in go-libp2p-kad-dht that need to be migrated. Plus the project board. For me that means go-kademlia is the focus of the reworked Kademlia DHT.

There is another option

Move refactored DHT to go-kademlia. Keep state machines in go-kademlia since they are for a specific implementation. Kademlia types (kad + key packages) go to new go-libdht repo with simulations etc. Issues related to the refactored DHT stay in go-kademlia, other issues move to go-libdht.

iand commented 10 months ago

Option 6 also solves our CI issue where we can't run a different CI pipeline for the v2 directory in go-libp2p-kad-dht

guillaumemichel commented 10 months ago

Option 6. also works for me

dennis-tra commented 10 months ago

go-kademlia would be available

go-libp2p-kademlia would also be available in the libp2p org or was it a typo?

We also have 100+ issues in go-kademlia and some in go-libp2p-kad-dht that need to be migrated. Plus the project board. For me that means go-kademlia is the focus of the reworked Kademlia DHT.

It would be annoying but GitHub allows transferring issues to different repositories (though there's no bulk operation and we'd need to use a custom CLI tool that automates that). It would be great to avoid a migration, for sure.

Move refactored DHT to go-kademlia. Keep state machines in go-kademlia since they are for a specific implementation. Kademlia types (kad + key packages) go to new go-libdht repo with simulations etc. Issues related to the refactored DHT stay in go-kademlia, other issues move to go-libdht.

Just to clarify, everything in go-libp2p-kad-dht/v2 would move to go-kademlia? If so, I'd prefer to call that go-libp2p-kademlia because it contains libp2p-specific code - and probably more of it than generic state machine code. However, then there's again a mismatch. libp2p code would live alongside generic Kademlia state machine code.

Semantically, I think it would be more correct for libp2p code to be in a repository called go-libp2p-kademlia and generic Kademlia state machine code to be in a repo like go-kademlia. At the same time, I would want to limit the number of repositories to just two. For me, this means the generic state machine code would either stay in go-libp2p-kademlia or move to go-libdht.

iand commented 10 months ago

I don't really want to bikeshed on naming but there are lots of projects that depend on libp2p without having the (redundant imho) extra mention of libp2p. I'm just suggesting options and will go with the consensus.

I don't see an issue with having generic state machines.

Someone with a clear view of ongoing requirements needs to write down the responsibilities of each repo.

guillaumemichel commented 10 months ago

I think it will require some reorganisation to reach more clarity about the modules. We could either:

have a distinct top level module for each DHT geometry and 1) have a sub module for each component. e.g /kad/routing, /kad/query, etc. 2) put all components (routing table, query mechanism, keys) in the same module /kad
Have implementations of each interface in the interface subfolder. e.g /routing/kad/, /routing/chord/, ... /query/kad/...

sim depends on events (or coord), so it cannot stay in go-libdht if there is not coord there.
server is implementation specific, and I am not sure that it makes sense to have a generic interface for it. (at least until we move Router / endpoint to go-libdht)

Someone with a clear view of ongoing requirements needs to write down the responsibilities of each repo.

I can write down how responsibilities would be split between the two repos.

dennis-tra commented 10 months ago

having the (redundant imho) extra mention of libp2p

yeah, true. If it's in the libp2p org, it's indeed redundant - I just viewed it from the plprobelab org's perspective. Hmm...

I've just looked through the first few pages of the libp2p org's repos, and having libp2p in the repository's name still seems to be the predominant naming convention: https://github.com/orgs/libp2p/repositories?page=1&type=all I'd prefer to stay consistent although it's redundant.

iand commented 10 months ago

I don't like perpetuating go- prefixes and technology names since, within the official organisations, it implies some official status. The kubo and boxo names were chosen because we wanted to move away from the idea that go-ipfs was the only go ipfs implementation.

If we're going to stick with the go-libp2p- prefix then my vote is option 5 plus move all the relevant issues to go-libp2p-kad-dht

dennis-tra commented 10 months ago

I don't like perpetuating go- prefixes and technology names since, within the official organisations, it implies some official status. The kubo and boxo names were chosen because we wanted to move away from the idea that go-ipfs was the only go ipfs implementation.

I think it's fine 🤷‍♂️ Descriptive repo names carry context that lowers cognitive load imo. But yeah, it makes kind of an authoritative claim... I think for go-ipfs/kubo the problem with implying an official status stemmed from the fact that PL was also the inventor of IPFS - which isn't the case for Kademlia. But I also see that we might want to keep room for another Kademlia implementation in Go using libp2p.

This could be achieved by choosing a unique name like kubo/boxo. In the past, you suggested naming the new implementation zikade, right? By discussing a concrete name for the new implementation, we could make the discussion about go-/libp2p etc., obsolete.

If we're going to stick with the go-libp2p- prefix then my vote is option 5 plus move all the relevant issues to go-libp2p-kad-dht

Initially, that was my preference as well, and also, after our discussion here, I'm still up for it.

As you pointed out, this will come with the drawbacks of broken CI for the v2 folder and issue migration overhead. Two solvable things, in my opinion.

iand commented 10 months ago

Option 5 seems to have consensus 🥳

dennis-tra commented 10 months ago

Cool, a proposal for the next steps in lose ordering:

[ ] Move state machines to go-libp2p-kad-dht/v2. This includes: 1) go-kademlia/query 2) go-kademlia/routing 3) more?
- [ ] routing/simplert -> go-libdht
- [ ] routing/triert -> go-libdht
- [ ] routing/bootstrap*.go -> go-libp2p-kad-dht/v2/coord/routing
- [ ] routing/include*.go -> go-libp2p-kad-dht/v2/coord/routing
- [ ] routing/probe*.go -> go-libp2p-kad-dht/v2/coord/routing
- [ ] query/iter*.go -> go-libp2p-kad-dht/v2/v2/coord/query
- [ ] query/node*.go -> go-libp2p-kad-dht/v2/coord/query
- [ ] query/pool*.go -> go-libp2p-kad-dht/v2/coord/query
- [ ] query/query*.go -> go-libp2p-kad-dht/v2/coord/query
[ ] Create go-libdht repository
[ ] Move kad + key + key/trie + triert + simplert + trie to go-libdht. Use new types from https://github.com/plprobelab/go-kademlia/pull/118?
[ ] Write down the responsibilities of go-libdht in the go-libdht/README.md
[ ] Write down the responsibilities of/provide context in go-libp2p-kad-dht
[ ] Migrate issues from go-kademlia to go-libp2p-kad-dht and go-libdht respectively

Am I missing something? I'm happy to take up all of it, although @guillaumemichel offered to do the two writing tasks. Let me know

iand commented 10 months ago

@guillaumemichel you mentioned once before that we should remove key.Key8 and key.Key32. Both are only used in tests. Key32 is used by the triert tests and Key8 is used in state machine tests.

I can move Key8 to a helper package in go-libp2p-kad-dht/v2/coord/ so there should be no need to move it to go-libdht.

The triert tests could easily be rewritten to use Key256 instead of Key32

guillaumemichel commented 10 months ago

@iand we could also move Key8 and Key32 to the kadtest package, so that we can keep using them in tests.

iand commented 9 months ago

Core types have moved to go-libdht: (moved by https://github.com/plprobelab/go-libdht/pull/2) All libp2p DHT implementation is in zikade (moved by https://github.com/plprobelab/zikade/pull/1)

probe-lab / go-kademlia

Decide code organisation between repos #120