ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.16k stars 3.01k forks source link

Please import Go libraries by their canonical non-ipfs paths #5054

Closed onlyjob closed 5 years ago

onlyjob commented 6 years ago

This is important in order to build IPFS reproducibly in offline build environment without IPFS run-time dependencies. Thanks.

schomatis commented 6 years ago

Hey @onlyjob, I agree, this is something that is currently being discussed as it is not trivial to implement the change at the moment (see https://github.com/ipfs/go-ipfs/issues/4831).

onlyjob commented 6 years ago

Thanks, @schomatis. Rewriting all import paths is tedious but perhaps trivial if you consider not using gx at all.

schomatis commented 6 years ago

@onlyjob The idea is to keep gx (which is a great tool) but rather to look for ways to integrate it to the code base minimizing its interference and improving the developer experience, for example adding a sub-command that just puts everything in the vendor directory using normal github paths all over the place which should make working on dependencies a lot easier to do (incidentally I'm using this issue to put public pressure on @whyrusleeping to help me get that going :smiling_imp:).

onlyjob commented 6 years ago

Using non-standard package management tool is a liability that doesn't help anyone. Moreover it seems to be a real blocker downstream, preventing from introducing IPFS to GNU/Linux distributions like Debian.

Is "gx" really worth it? "gx" may be a nice toy but what problem does it solve?

whyrusleeping commented 6 years ago

sure, give me an alternative to gx that satisfies the following:

required:

very nice to have:

dep fails here as it both requires all code to be stored in a single repo, and depends heavily on github. i'm still waiting on vgo, it seems like it will be able to give us the ability to not store all our code in a single giant repo, but unclear how much it still depends on github (some amount is obviously okay, but the failure scenarios need to be considered), and also i'm not sure how strict its version locking can get. Ideally it would be able to do content addressible dep locking (via git hashes is fine).

whyrusleeping commented 6 years ago

(note: just woke up and may be forgetting something from that list. using our own dependency management tool was nothing something undertaking lightly. There were and still are serious issues with go dep management that needed to be worked around)

onlyjob commented 6 years ago

give me an alternative to gx

There are plenty of them, glide and others. They are all seems to be doing the same thing more or less. I'm not the expert as we don't use any of those tools in Debian...

the entirety of all my deps are not stored in one repo

What do you mean? Dependencies are cached in your repo. If you don't want to commit dependency libraries then you can just gitignore vendor directory...

I can be assured that when i build, i'm running exactly the code i think i'm running (not relying on mutable upstreams)

I'm not sure that can be done. Besides although it would be nice to have you don't actually need that. Pulling semantically versioned libraries from their upstream repositories should be good enough. Golang does not guarantee that libraries are not compromised but in Debian you can trust packaged libraries. If you pull your dependencies from Debian then you can build reproducibly in offline environment knowing that all libraries are signed and provided by trusted source (native repositories).

doesnt depend on githubs continued existence or uptime

That's nice but if/when github dies you should still have your cached copy of libraries. This shouldn't be a requirement but if you want that then Debian can provide trustworthy libraries.

libraries are also vendored (don't give me that line about only vendoring binaries, its crap)

I don't understand what do you mean... I would never suggest to rely on pre-built vendored binaries - that's a terrible idea. :)

If guthub dies or become unavailable (that doesn't happen too often) then changing paths to new location is not too much of an effort.

IMHO Goland dependency management is a terrible mess and abomination of decades of best practice in software engineering... :(

whyrusleeping commented 6 years ago

There are plenty of them, glide and others. They are all seems to be doing the same thing more or less. I'm not the expert as we don't use any of those tools in Debian...

Yeah, thats the problem, they all do the same thing more or less, which is (in my opinion having spent considerable time dealing with them) the wrong thing.

What do you mean? Dependencies are cached in your repo. If you don't want to commit dependency libraries then you can just gitignore vendor directory...

The typical go solution to vendoring right now is to pull all of your dependencies into your vendor folder, and commit them to git, and push them up to the repo, leading to your repository becoming massive. which is unnacceptable to me. go-ethereums codebase is getting close to a gig, and its growing.

I'm not sure that can be done.

But thats exactly what gx does :) and it can be done with git submodules too (which is a 'valid' but really annoying way to solve this).

If you pull your dependencies from Debian then you can build reproducibly in offline environment knowing that all libraries are signed and provided by trusted source (native repositories).

Yeah, and thats great for debian users. But not everyone uses debian, and setting up a similar system for each and every different possible distro people use is a huge amount of work. Setting up one system for everyone to pull trusted dependencies from is exactly what gx does. The code is referenced by hash in a git repo, that git repo has signed tags.

I would never suggest to rely on pre-built vendored binaries - that's a terrible idea. :)

Oh good, you're sane ;)

IMHO Goland dependency management is a terrible mess and abomination of decades of best practice in software engineering... :(

sigh yes. its pretty bad. it's nice in the simple cases, where you just want to hack something together, and its nice when you fully control all the code you are using in your project, but for everyone else, its bad.

onlyjob commented 6 years ago

Yeah, and thats great for debian users. But not everyone uses debian, and setting up a similar system for each and every different possible distro people use is a huge amount of work.

Well, your unique build system is equally unusable to anyone outside of this project. If you develop on Debian then you can even build statically linked binaries for GNU/Linux users. Though it doesn't help other platforms...

Any Goland dependency management should work well enough comparing to current state as long as you don't commit "vendored" libraries...

whyrusleeping commented 6 years ago

so back to your real problem... does make go-ipfs-source.tar.gz not work for what youre trying to accomplish? It puts together a tarball of all the source files needed to build go-ipfs.

onlyjob commented 6 years ago

We need to build IPFS downstream in Debian from reusable packaged libraries. All libraries have canonical persistent import paths which should be used regardless of vendoring tool. Building IPFS with reusable libraries is currently impossible because all import paths a hijacked...

Stebalien commented 6 years ago

The canonical import path of a go-ipfs dependency is gx/ipfs/Qm..... That is, we name our dependencies by hash and we expect exactly the version we specify (by hash). We used to leave import paths in the github.com/foo/bar form and re-write them to gx/ipfs/Qm... at compile time but then users would try to build go-ipfs manually and it wouldn't work (due to broken dependencies).

At this point, we could probably use dep and specify exact commit hashes. However, while it has its quirks, gx has some really nice advantages:

  1. If I look at a stack trace, I can know exactly what code is being run (because the packages are named by hash).
  2. Builds are very reproducible.
  3. Our dependencies are content addressed so they'll stick around even if upstream disappears or moves.

Really, what Debian wants is exactly what we're trying to avoid by using gx.

onlyjob commented 6 years ago

Importing by hash is a silly idea because import paths always change. Import by persistent (canonical) path but try to vendor by semantic tag/version whenever possible. Commit hashed do not belong to import paths and it is prohibitive for packaging.

You are trying to freeze your dependencies but that is silly. Dependencies should evolve independently as long as they don't break API (hence semantic versioning is important).

Freezing dependencies is stupid: you'll loose touch and won't keep up with their updates. Remember that Golang build system runs no tests for vendored dependencies whatsoever. You do not want to freeze unreliable, broken or untrustworthy libraries because it have all the flaws of static linking which is evil, unsustainable and incompatible with security.

Debian is all about best practice. Don't avoid best practice. Embrace it. :)

whyrusleeping commented 6 years ago

The old school debian idealism is showing its age. We're not going to sit around and argue about whether dependency freezing is good or not, as we obviously disagree there. We have different perspectives, I have to maintain a large codebase, and you (presumably) have to maintain a set of packages. These two jobs require different considerations, there are things that I need to do that a package repository maintainer doesnt have to think about, and visa versa. Dependency freezing absolutely makes my job as an open source maintainer easier. (though I fail to see how dependency freezing makes your job harder, you have to update all the sources and recompile each go project anyways).

Best practices between us differ. I'm embracing mine. (oh, and when will I get firefox quantum on my debian box? I've been waiting quite a while now)

Stebalien commented 6 years ago

Freezing dependencies is stupid

The cargo (rust), npm (node), Nix, and Guix devs all disagree with this. Nix/Guix even use content addressing.

whyrusleeping commented 6 years ago

Also, to address your response to my question:

We need to build IPFS downstream in Debian from reusable packaged libraries.

Don't do that. We havent tested and verified ipfs with those dependencies. We have tested and verified ipfs with exactly the dependencies that we ship it with. Building ipfs with other dependencies is just begging for something to go wrong that will be super annoying for me to have to debug. If debian starts shipping an ipfs built with dependencies of their choosing, my response to any bug report from a debian user will be "please use a version of ipfs compiled from canonical source"

onlyjob commented 6 years ago

Freezing dependencies is a silly delusion that comes with a cost. Software do not exist in a vacuum. Even frozen dependencies will break over time due to changes in something they depend upon or in the compiler itself. You do not test your dependencies on build and freezing dependencies is a manifestation of fear of upgrade. Frozen dependencies always rot and over time you software will be mostly made of unsupported and unmaintained libraries, possibly vulnerable and/or broken on some architectures. Not only you are trying to do the wrong thing but you are also using uncommon non-standard tool(s) to do that which raises contribution threshold. Personally I have no incentive to figure out messy build system of yours which only this project is using exclusively. If you consider building IPFS with less unusual tricks then more people will be able (and probably more willing) to contribute. Introducing IPFS to Debian (and through Debian to all its derivatives) will dramatically increase IPFS exposure and will bring potential to attract contributors. You have more to win by working with community if you prepared to give up your arrogant way to maintain software.

FYI firefox quantum is available from unstable for a while.

As for freezing dependencies, stable distro release does that for you so you can plan for transitions and accommodate updated libraries only once in a while between distro release cycles.

Stebalien commented 6 years ago

Even frozen dependencies will break over time due to changes in something they depend upon or in the compiler itself.

We freeze dependencies recursively. Transitive dependencies can't change. As for the compiler, that can happen but it's very unlikely (go tries very hard to not break things).

Now, personally, I'd like to introduce a dependency replacement system. That is, a way to tell gx to replace all instances of gx/ipfs/QmA with gx/ipfs/QmB. That would allow packagers to override specific dependencies to fix specific bugs. We could also extend that to allow general overrides (e.g., replace gx/ipfs/QmA with github.com/Foo/Bar) but we don't want to encourage that.

As for freezing dependencies, stable distro release does that for you so you can plan for transitions and accommodate updated libraries only once in a while between distro release cycles.

So now we have to pay attention to every distro that includes go-ipfs?

you are also using uncommon non-standard tool(s) to do that which raises contribution threshold

This we agree with. Unfortunately, there really are no better alternatives that provide us with the same guarantees. Our approach here is to improve gx but our time has always been limited.

Introducing IPFS to Debian (and through Debian to all its derivatives) will dramatically increase IPFS exposure and will bring potential to attract contributors.

We've never sought inclusion in distros like Debian because we really don't want to deal with bug reports from users running a heavily patched 2 years out of date version of go-ipfs. We'll revisit this decision when we hit 1.0 but it'll be a hard choice to make.


TL;DR: we really don't want distros automatically overriding the dependencies we specify.

onlyjob commented 6 years ago

I've just learned about braid that looks like it might be worthy of trying... It is not specific to Golang so it might be more or less universal solution to vendoring... Looks interesting and it can vendor by tag.

Kubuxu commented 6 years ago

From a perspective of vgo, that might become packaging standard in Golang:

required:

  • the entirety of all my deps are not stored in one repo
  • I can be assured that when i build, i'm running exactly the code i think i'm running (not relying on mutable upstreams)

vgo checks those boxes but allows distros and packages, that use ipfs as a dependency, to require higher version. This means we would have to fork off some packages (gogoprotobuf) as an example. We are already doing this but squatting the namespace of the original package.

very nice to have:

  • doesnt depend on githubs continued existence or uptime

Possible and should not even be that hard to create IPFS based repository that can run globally or locally.

  • nice tooling for iterative bubbling up and testing of dependency changes in the tree

Dependency updates in vgo are much easier, and vgo chooses the minimum version of all dependencies. For testing, you can update the dependency just in top package.

  • libraries are also vendored (don't give me that line about only vendoring binaries, its crap)

vgo doesn't use vendoring but it works as a bit weaker vendoring, the higher level package can upgrade package in the whole tree.

b5 commented 6 years ago

vgo is now approved and will become part of the language as an experimental opt-in in go 1.11, probs going to be mainlined in go 1.12: https://research.swtch.com/vgo-accepted

I personally see a log of opportunity for growth between the vgo & gx stories:

Either way with golang taking a firmer stance on the versioning conversation, it might be possible to set the two package managers on a harmonious trajectory.

whyrusleeping commented 6 years ago

@Kubuxu

Dependency updates in vgo are much easier

Have any links to how this works? I want to make sure I can still make that 'implicit' dependency updates are tested. For example, If I have package A that depends on B and C, and C also depends on B. If I update 'B' in A, I want to make sure that C has also been tested with that new version of B.

Kubuxu commented 6 years ago

As far as I know, Golang team promised/was considering a wide array of tooling around testing. Starting off with tooling that would allow for running tests of all dependencies with currently chosen versions and ending at the possibility of running tests of packages that depend on your library so you know that you didn't break anything.

This doesn't include tooling around a static analysis of inter-package interfaces.

I don't have any links, unfortunately.

whyrusleeping commented 6 years ago

@onlyjob I'm curious about the debian plan for go repositories. Projects like ipfs, and like ethereum, and docker, and many others all rely on hundreds of different upstream code repositories (and they all use dependency freezing, btw). The dependency graph there is very very large. Who maintains each of those? Is it an automated process that pulls things from github? Or is there a person whose responsibility it is to keep each one properly up to date? (presumably this also applies to projects in other languages). When I freeze my dependencies, that is saying that I trust all that code is 'correct' and not compromised at the time of me freezing it. Without doing that, I'm trusting that those properties hold at all points in the future. If the computer of a maintainer of even a small dependency of mine gets hacked, malicious code could be placed there in secret (say, a cryptocurrency miner ).

What is the approach taken by the debian community for this? I assume this isnt a new problem, it just seems like something that gets exacerbated by projects with a very large number of external dependencies.

onlyjob commented 6 years ago

There is no automated process to pull from GitHub and IMHO it is not needed. If library is used by only few packages then you have at least few maintainers interested to keep that dependency up to date. Debian QA will continuously test all packages, rebuild them, etc. Bugs will be filed automatically etc. This is how we can maintain healthy ecosystem of reusable packages. Even complex software can be built from re-usable libraries. Even though Golang was making it difficult it doesn't have to be. Semantic versioning seems to be the key to dependencies. It works while incorporating random commit doesn't.

Docker is a paramaunt of incompetence and abuse of meaningful versioning practices. A very bad example of enormous mess they selfishly created by refusing to cooperate with community, breaking interfaces all the time, needless forking, etc. Docker release workflow is a terrible mess.

Downasteam packaging of Docker is ridiculously difficult because Docker code base is spread amond several name spaces. You would think some of the libraries that Docker uses are meant to be reusable but they are not due to circular dependencies, etc. With Docker it might be necesary to ship multi upstream tarball (MUT) with Docker "components" but that's because Docker devs don't believe in (stable) API.

I hear your concerns about security of libraries and I think that Debian adds extra layer of confidence in that regards. This is how go-maintainers QA page looks like:

https://qa.debian.org/developer.php?login=pkg-go-maintainers@lists.alioth.debian.org

This is example of build logs for etcd: https://buildd.debian.org/status/package.php?p=etcd

whyrusleeping commented 6 years ago

But what is the chain of custody for updating one of those git repos? Can a single package maintainer move things unilaterally?

On Fri, Jun 15, 2018, 9:29 PM Dmitry Smirnov notifications@github.com wrote:

There is no automated process to pull from GitHub and IMHO it is not needed. If library is used by only few packages then you have at least few maintainers interested to keep that dependency up to date. Debian QA will continuously test all packages, rebuild them, etc. Bugs will be filed automatically etc. This is how we can maintain healthy ecosystem of reusable packages. Even complex software can be built from re-usable libraries. Even though Golang was making it difficult it doesn't have to be. Semantic versioning seems to be the key to dependencies. It works while incorporating random commit doesn't.

Docker is a paramaunt of incompetence and abuse of meaningful versioning practices. A very bad example of enormous mess they selfishly created by refusing to cooperate with community, breaking interfaces all the time, needless forking, etc. Docker release workflow is a terrible mess.

Downasteam packaging of Docker is ridiculously difficult because Docker code base is spread amond several name spaces. You would think some of the libraries that Docker uses are meant to be reusable but they are not due to circular dependencies, etc. With Docker it might be necesary to ship multi upstream tarball (MUT) with Docker "components" but that's because Docker devs don't believe in (stable) API.

I hear your concerns about security of libraries and I think that Debian adds extra layer of confidence in that regards. This is how go-maintainers QA page looks like:

https://qa.debian.org/developer.php?login=pkg-go-maintainers@lists.alioth.debian.org

This is example of build logs for etcd: https://buildd.debian.org/status/package.php?p=etcd

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipfs/go-ipfs/issues/5054#issuecomment-397620253, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL4HH09boTLuU0Ordzzk6amoJ6jltdcks5t87bCgaJpZM4UVK3f .

onlyjob commented 6 years ago

Yes single maintainer can do that. Occasionally some coordination is necessary to facilitate library transition and re-build reverse build-depends. Most Golang packages are team-maintained and any team member can work on any package. Sometiomes it is quite effective. :)

whyrusleeping commented 6 years ago

@onlyjob hrm... but my concern here is that means it only takes one of those (potentially) hundreds of people to have their computer compromised in order for an attacker to sneak bad code in.

onlyjob commented 6 years ago

Don't use GitHub then? ;)

schomatis commented 5 years ago

Closing as I think there's no actionable item here, feel free to reopen otherwise.

onlyjob commented 5 years ago

Of course it is very actionable. Just use normal import paths like pretty much all the Golang projects do, except ipfs.

Please reopen and keep this ticket opened until fixed.

schomatis commented 5 years ago

Sorry I wasn't very clear, I meant that there is no concrete step in the short term that we can take to keep this issue open to track it (although the discussion has value on itself to have different perspectives on how to handle dependencies). What I'm interpreting from the previous comments is that we're committed to this design path and this is not something that we evaluate changing in the short term.

Stebalien commented 5 years ago

We provide source releases for building without network access: https://dist.ipfs.io/go-ipfs/v0.4.18/go-ipfs-source.tar.gz.

We have no plans to building without vendored dependencies at this time.