libp2p / go-libp2p

libp2p implementation in Go
MIT License
6.08k stars 1.07k forks source link

Repo Consolidation: Round 3 #1556

Closed marten-seemann closed 2 years ago

marten-seemann commented 2 years ago

The transport consolidation (#1187) has dramatically reduced repo sprawl. We now only have a few go-libp2p-* repos left in the dependency tree of go-lipbp2p. This is a proposal to also consolidate them into go-lipbp2:

Not sure what to do about go-libp2p-asn-util. This repo should probably have been named go-asn-util, it's not libp2p related in any way. I also don't feel like moving several MB of ASN here.

Anything I missed? Thoughts, @MarcoPolo @vyzo @Stebalien @raulk?

vyzo commented 2 years ago

asn no, the rest is reasonable proposition.

Stebalien commented 2 years ago

Agree with vyzo. Otherwise, lgtm.

vyzo commented 2 years ago

I think we should keep core separate, it's the interface after all.

marten-seemann commented 2 years ago

I think we should keep core separate, it's the interface after all.

I think I disagree. core being separate is causing us a lot of trouble:

We probably should keep the GitHub Action that posts the gocompat output every time we make a change to /core, so we don't accidentally break things.

vyzo commented 2 years ago

Yes, but it allows libraries to just depend on core and not pull the whole go-libp2p except for trsts.

On Sat, May 28, 2022, 20:17 Marten Seemann @.***> wrote:

I think we should keep core separate, it's the interface after all.

I think I disagree. core being separate is causing us a lot of trouble:

  • When we want to change one of the interfaces, we first have to cut a core release before we can use them in go-libp2p. It would be nicer to make that change atomically.
  • core and go-libp2p being out of sync (core already having released an update, but go-libp2p not yet) causes errors for downstream users who run go get -u, and then we need to tell them which versions of core go with which versions of go-libp2p

We probably should keep the GitHub Action that posts the gocompat output every time we make a change to /core, so we don't accidentally break things.

— Reply to this email directly, view it on GitHub https://github.com/libp2p/go-libp2p/issues/1556#issuecomment-1140300109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAI4SQ6IM6VYHSN3LFE63TVMJIJNANCNFSM5XAIKMVQ . You are receiving this because you were mentioned.Message ID: @.***>

marten-seemann commented 2 years ago

Should we be worried about this? I'd assume that if you pull in libp2p types, you're also using go-libp2p. You could imagine a project suffering from repo sprawl where a library only needs the type, and go-libp2p is pulled in by their main repo, but even then the only effect is that go get will download slightly more data.

vyzo commented 2 years ago

It is semantically different; i think a separate interface is good practice for something as complex as libp2p. After all, apart from instantiation and tests, user code only cares about the interface.

vyzo commented 2 years ago

think of it as the header file.

marten-seemann commented 2 years ago

Any suggestion how to fix the two problems I listed above then? For me, this is mostly about development velocity, and I want to get away from having to release one module (in a release that we don't want users to use yet!) in order to make progress in another.

think of it as the header file.

Which is usually kept in the same directory as the source file 🤪

vyzo commented 2 years ago

Any changes in interface should be carefully considered and too much velocity there is indicative of instability. We dont have to release before creating the implementation pr, and only release when the implementation in go-libp2p is ready to merge. The window of a broken go get -u is insignificantly small then.

marten-seemann commented 2 years ago

Not convinced of the cleanliness argument, but let's look at some real-world examples. I looked at libraries building on top of libp2p, and determined their libp2p dependencies after deleting all test files (rm **/*_test.go && go mod edit -go 1.17 && go mod tidy).

go-libp2p-pubsub

github.com/libp2p/go-libp2p-core v0.15.1
github.com/libp2p/go-libp2p-discovery v0.6.0

go-libp2p-discovery was moved into go-libp2p, so this would still need to import go-libp2p.

go-libp2p-kad-dht

github.com/libp2p/go-eventbus v0.2.1
github.com/libp2p/go-libp2p-core v0.15.1
github.com/libp2p/go-libp2p-kbucket v0.4.7
github.com/libp2p/go-libp2p-peerstore v0.6.0
github.com/libp2p/go-libp2p-record v0.1.3
github.com/libp2p/go-libp2p-routing-helpers v0.2.3
github.com/libp2p/go-libp2p-swarm v0.10.2
github.com/libp2p/go-libp2p-xor v0.1.0
github.com/libp2p/go-msgio v0.2.0
github.com/libp2p/go-netroute v0.2.0

go-libp2p-swarm (moved into go-libp2p in round 2) is imported for an error assertion, and go-libp2p-peerstore (will be moved in round 3) for the slightly weird function PeerInfos.

go-bitswap

github.com/libp2p/go-buffer-pool v0.0.2
github.com/libp2p/go-libp2p v0.14.3
github.com/libp2p/go-libp2p-core v0.8.5
github.com/libp2p/go-libp2p-loggables v0.1.0

go-libp2p is import to access the ping package.

go-graphsync

github.com/libp2p/go-buffer-pool v0.0.2
github.com/libp2p/go-libp2p-core v0.8.5
github.com/libp2p/go-msgio v0.0.6

This actually wouldn't need to import go-libp2p.

MarcoPolo commented 2 years ago

I think we can still be very deliberate with changes to the core interfaces even if it's in a single repo. I'm not aware of many libraries that have their interfaces defined in a separate repo, but please correct me if I'm wrong.

From a user's POV a single repo would be better since go-libp2p + interfaces are well defined for any given commit.

vyzo commented 2 years ago

My main rationale for the interface being a separate repo is that libp2p is a very complex system, and that applications and libraries only ever care about the core interface (except for instantiation and perhaps tests).

I don't want our users to get in the habbit of using internal libp2p components; the libp2p contract is that the interface is a stable facade, and in internals anything goes.

The development velocity effect is now gone, given rhat we are essentially a monorepo. When we want to make a change in core, we open pr, develop using that pr in the monorepo, and when its ready we merge both prs.

No release is necessary in either repo, it is totally fine for master to point to core master. When we are ready for release, we bump both and off we go. The probablity of a user doing a go get -u and breaking while we are tagging releases is vanishingly small.

Stebalien commented 2 years ago

I think the right way to think of this is as follows: go-libp2p is now a "subproject" within the libp2p org, comprised of many separate packages that happen to be versioned together for convenience.

Minimal Dependencies

This was, IIRC, the main reason for the split.

However, this is no longer an issue, from what I can see, as long as we keep "core" from depending on anything else in go-libp2p. As of go 1.17,:

External Transports

By including core, there's no way to:

  1. Develop an external transport that depends on core.
  2. Make go-libp2p depend on this transport.

Any time we make a breaking change to the transport, we'd need two releases to pull through an update.

On the other hand, if/when we get to the point where we want go-libp2p to depend on some transport, I expect we'd want to just pull it into the tree itself?

External Development

In theory, multiple repos allows third-parties to "own" separate codebases (other repos). In practice, this is wishful thinking.

Furthermore, this change won't prevent third-parties from developing transports. It just means that, if we ever want to ship the transport by default, it'll need to live in this repo (or we'll have to pull it in with git subtrees). But, IMO, that's fine.

Development Velocity

The current split makes breaking changes very painful. This isn't all bad, but it has made some refactors take longer than they should.

Additionally, It's very easy to get the system into an "unreleasable" state without realizing it.

No release is necessary in either repo, it is totally fine for master to point to core master.

This isn't fine and it has cost us (me) a ton of time. It has also delayed critical fixes, sometimes for months when we thought some tiny change to the core interfaces would be easy, but it actually caused a cascade of random stuff we needed to deal with.

Separate Versioning

One of the drawbacks to the new approach is that it makes it impossible to cut a release of a single package (e.g., for a bug fix). However, IMO, that's a good thing. It's really easy to forget that some bug was fixed in some random release of a random repo.

BigLep commented 2 years ago

A couple of thoughts:

  1. ResourceManager readme docs in https://github.com/libp2p/docs/pull/160/ will need to be updated with new links when this is done.
  2. If it isn't there already, can we please capture the repo/package rationale in the README (likely taking content from https://github.com/libp2p/go-libp2p/issues/1556#issuecomment-1143924221 )?
p-shahi commented 2 years ago

Closing as "Update links in dos-mitigation.md" was addressed in libp2p/docs#165