whyrusleeping / gx

A package management tool
MIT License
1.88k stars 111 forks source link

gx-gomod #200

Open keks opened 5 years ago

keks commented 5 years ago

Hey @whyrusleeping, here is my extended sketch of how gx could benefit from go modules. Also pinging @Stebalien because you wanted him to also look at this. I hope you like it!


Go 1.11 will introduce go modules, which have been proposed in Russ Cox' Go & Versioning series in the vgo fork. Modules impact the development workflow in a number of ways, and opens new possibilities to streamline that of gx.

I want to build gx-gomod and teach gx modules. Specifically, I want to get gx hashes out of .go files, which also simplifies things like automated rewriting.

The most obvious change is the new per-module go.mod file, which is used to track dependencies and their versions. It allows three types of declarations: require, exclude and replace.

The structure of this proposal is as follows. First, we will look into what we can and can't do with the go.mod file. Then, we will look into how go get fetches modules, and how we can make it fetch them from gx. Finally, we will look into how to build gx-gomod projects when the user has neither ipfs nor gx installed.

go.mod

The go.mod file is used to specify the versions of the dependencies of a project. It allows three types of declarations:

Note that each of these has a version. Actually, go.mod has a detailed specification of what a version string is. Go basically uses semver, but introduces pseudoversions. These look like v0.0.0-<committime>-<commithash>, and are also valid semver strings.

This approach can be adopted for gx, by using version strings like v0.0.0-gx-ipfs-Qm.... This allows to treat a gx hash not as an import path, but as a version, which is a much better way to think about it.

Unfortunately, go's automatic version finding algorithm will not be work with these, because they are all pseudoversions. One way to tackle this is to have a list of versions and where to find the code behind an ipns hash. That way we have an import path that remains the same, and new versions can be amended.

However, I'm not sure if this kind of mutability is wanted or not.

Download Protocol

Currently, most go modules are hosted on github, and the developer of the go tools have hardcoded a good way to get the data they need from that there. However, they also provide an interface that can be implemented by others. This allows building module caches or distributing private go modules inside a companies network. It also allows us to resolve the pseudoversions and deliver the modules from gx.

There are two ways to make go get to use that protocol. The first way is like vanity imports for modules and requires including

<meta name="go-import" content="mydomain.com/import/path mod https://hosting/path" />

in the html returned when querying https://mydomain.com/import/path?go-get=1. The second one is used, when the GOPROXY environment variable is set. For example, we could set it to http://localhost:8060/ and run a cache proxy on that port. Instead of a cache proxy, we could however also resolve packages through non-standard channel, such as gx/ipfs.

The download protocol itself is simple and HTTP-based. It is described in the section "Download Protocol" of part six of the G&V series (sorry no fragment link):

GET baseURL/<module>/@v/list fetches a list of all known versions, one per line. GET baseURL/<module>/@v/<version>.info fetches JSON-formatted metadata about that version. GET baseURL/<module>/@v/<version>.mod fetches the go.mod file for that version. GET baseURL/<module>/@v/<version>.zip fetches the zip file for that version.

Well, that seems simple enough to implement! If we run a server like that locally to resolve our gx modules, all we need to do is

But what do we do about traditional imports? It would be nice to fetch those the traditional way and forward them to the client. This sounds simple, but there are some problems. Let's look at the obvious solution: if it's a gx hash, deliver the code from ipfs hash, else, do what go get would usually do and deliver that. The problem here is, that "what go get usually does" is somewhere in "cmd/go/internal/...", so we can't just import it, but have to fork it, which means it's a lot of effort to maintain. An alternative is to redirect the request to a well-known proxy or registry. For efforts in this direction, look at gomods.io. The people gomods also plan to fetch source code from github et al., so maybe we can reuse their code that does that. However, I haven't looked at it long enough to say whether that would work. They have community calls, so if we decide to go down this road, we can just chat with them and see whether they have a good idea.

Bootstrap is a PITA

What I find especially interesting about this solution is that we can run a public gx module proxy. That way it is dead simple to download, build and use gx modules (like ipfs) without even having gx or ipfs installed, by using GOPROXY=https://goproxy.ipfs.io go get. Once everything is installed, the user can switch to a local GOPROXY.

Furthermore, it might be possible to apply as a backend for gomods. That way, we don't have to host that public proxy ourselves, but gomods would use ipfs to fetch the code and then host it for us. I'm not sure they are interested in something like this, considering that go mods is all over semver and we only use pseudoversions, I can imagine that they are not very excited about this.

Stebalien commented 5 years ago

So, this helps fix the issue of gx paths in the source code but, unfortunately, doesn't fix the problem of bubbling gx updates. Please take a look at my motivations here: https://github.com/whyrusleeping/gx/issues/179.

Basically, for me at least, the biggest time-sink is the lack of a dependency resolution system.

Using go modules out of the box will fix our some of our biggest issues with go get: it'll work even if we make breaking changes (as long as we version properly).

The large missing piece is security and reproducibility. That's where https://github.com/whyrusleeping/gx/issues/179#issuecomment-408243162 comes in. Basically, for our builds, we can use a special gx build tool to make sure we build with the correct, audited dependencies.

keks commented 5 years ago

Thanks for pointing me to that issue, I didn't see that one.

Regarding you main motivation

Basically, for me at least, the biggest time-sink is the lack of a dependency resolution system.

I think if we are be able to use version strings like v2.7.3-gx-ipfs-Qmb3GBFCHMuzmi9EpH3pxpYBiviyc3tEPyDQHxZJQJSxj9 and use ipfs to resolve them and fetch the code, we will be able to have the dependency resolution features of go and the security features of ipfs. I believe it also fulfills the reproducibility requirement, though I'm not exactly sure what you mean by that. For example when building go-ipfs with dependencies carrying gx versions, the set of module version candidates is fixed and the minimum version selection will always return the same build list. On the other hand, it might happen that go-ipfs overrides the version of a transitive dependency, so library modules might be used with different dependency versions than they were developed with. But as far as I can tell, this is a requirement to avoid having to publish a new release for every transitive dependency, just to bubble up an update of a dependency leaf, so that's a feature.

I am not 100% sure how go mod behaves when all build candidates are prereleases (i.e. v1.2.3-xyz), and I'm not sure we can abuse semver in this way. But I'd be up to build a GOPROXY-compatible gx module server to try that out. One caveat here: go get will prefer non-prerelease versions, so if we let the server try to resolve non-gx modules, we'll need to disable that for modules that we have gx candidates for. Otherwise go might use those.

keks commented 5 years ago

ping @Stebalien

Stebalien commented 5 years ago

I am not 100% sure how go mod behaves when all build candidates are prereleases (i.e. v1.2.3-xyz), and I'm not sure we can abuse semver in this way. But I'd be up to build a GOPROXY-compatible gx module server to try that out.

Yeah, my worry here is that go will treat this kind of version as an "opaque" version and not to semver-based dependency resolution. However, if that still works, this would be awesome.

My only remaining concern is that we can't really force everyone that depends on our stuff to use gx. We may be able to get them to at least install it (as long as they never have to deal with it directly or use it directly in their projects) but we'll have to be careful about that.

The nice thing about the "lockfile" approach is that it's entirely independent. It allows us to specify dependencies in the language's native dependency system while still allowing users to "opt-in" to the guarantees that gx provides.

Stebalien commented 5 years ago

@keks I've been discussing this with @travisperson and @whyrusleeping and we're currently planning on going with the lockfile for the independence reason. For us, this means:

  1. It can coexist with any other dep solution happens to be "hip" at the time. All we need is a plugin that can understand this package manager enough to import the correct versions into IPFS and appropriately update the lockfile.
  2. Users can continue to use their favorite package manager (without having to install a custom gx proxy). If gx becomes popular enough, we can try to add direct support to vgo but we'll probably still keep the lockfile.
  3. It's language independent. We can reuse the same system elsewhere without depending too much on the language-specific package manager (except for initial dependency resolution).
  4. We can build without even installing any other package managers (we'd only need one to do dependency resolution).
  5. (the real motivation) we can implement a basic version without touching vgo. We won't get semver logic in that version but it should work and should be fairly easy to implement.
keks commented 5 years ago

@Stebalien I'm not 100% clear what you mean by lockfile approach. I know npm's package-lock.json and assume you want to generate a file like that for the concrete package manager in use. I don't see how we can use that in a straightforward manner so I searched for issues in the gx repo mentioning lockfiles, but that wasn't very fruiful. I have some rough ideas on how that could work, but I'm not sure. I'm especially curious whether we'd still have gx hashes in imports paths. Anyway, let me respond to the five points you brought up.

  1. TBH, I believe go modules will stick. Some details may change, but I assume it's pretty stable right now and the API will mostly grow and not change radically. So at least for Go I think it's the way forward.
  2. Well, go modules still allow us to do what gx install did until now, except we have to fetch the modules to $GOPATH/src/mod/... instead of $GOPATH/src/gx/.... Also we would be able to let the user use a public gateway for fetching the sources, so we would even let users install gx dependencies without them having any gx or ipfs on their computer.
  3. Yes, go modules obviously are language-dependent. But I think it's better to have good integration in the few tools you use that supporting a wide range of tools poorly. And even with gx-gomod, other languages can still use whatever gx mechanism they see fit.
  4. Isn't dependency resolution pretty important?
  5. I can't really comment because I don't see where you're heading :)
Stebalien commented 5 years ago

I mean https://github.com/whyrusleeping/gx/issues/179#issuecomment-408243162. That is, we somehow put all the package's dependencies (including the transitive ones) in a a single file. The format would be something like:

{
   "dependencies": {
     "github.com/ipfs/go-ipfs-cmds": {
       "path": "/ipfs/QmId.../go-ipfs-cmds"
       // Space left in case we need additional fields
    }
}

We can then install (without rewriting) for development by creating a vendor directory and symlinking. E.g., symlink ./vendor/github.com/ipfs/go-ipfs-cmds/ to $GX_CACHE/ipfs/Qm.../go-ipfs-cmds (or literally to /ipfs if it's mounted).

To build with rewriting (to get gx paths in stack traces), we'd copy everything, including the current package, into a temporary directory rewrite according to the lockfile (it's literally a rewrite map), and then build.

As you pointed out, the "somehow put all the dependencies into the lockfile" is the interesting part. From the user's standpoint, they'll run some command (gx sync or gx lock) to build the lockfile. My plan is to use vgo eventually but:

  1. In version 0, we may just use go get (i.e., go-get into an empty gopath, import everything into gx, build the lockfile). This should be very simple.
  2. In version 0.5, we'll probably use the existing package.json files (recursively resolve dependencies and dump this dependency list into the lockfile).
  3. In version 1.0, we'll add support for vgo.

On the other hand, we won't remove support for basic go get and the gx's current package.json.


Basically, we're separating dependency resolution from building and making dependency resolution pluggable. We'll (carefully) do dependency resolution once when updating, lock these dependencies into the "lockfile", and then use gx to actually install dependencies and build. In most cases, users will be free to entirely ignore gx and use their chosen dependency resolution tool when building.


Note: The plan is to never check-in gx-re-written paths. That's just annoying for everyone.

dennwc commented 5 years ago

I have a counter-proposal for integration with Go modules.

Another feature introduced with modules was GOPROXY. Maybe changing gx to be a go modules proxy that reads/writes to IPFS will be a better integration point?

I imagine that you run gx proxy /some/ipfs/prefix and it will serve all dependencies from this prefix, and if there is no such file it will try to fetch it from the network and write it to IPFS under the same prefix. Details may vary, but this will allow go-ipfs to use the standard way of managing Go packages (modules) while giving the ability to store all of the dependencies in IPFS.

whyrusleeping commented 5 years ago

@dennwc I like that direction. I think it's worth quickly sketching out how that might work

backkem commented 5 years ago

Using GOPROXY is a really cool idea. It's unfortunate that the download api doesn't include the tree/commit hash. That may have allowed a completely transparent IPFS proxy to be built. Maybe it's worth trying to convince the go team to add this (if it's not already to late)?

dennwc commented 5 years ago

It's definitely not late, since Go 1.12 will be released in Feb with modules enabled by default. Until that time there is some time to discuss small changes to modules.

Can someone create an issue in the Go repository and describe what exactly is needed?

backkem commented 5 years ago

The 'transparent proxy' idea is also based on the assumption that the hashes in the go.sum file are usable to lookup data on IPFS. Sadly I'm not deep enough into either project to be able to answer that right now.

backkem commented 5 years ago

If this assumption doesn't hold (which seems likely). One way of building @dennwc's original idea could be to build a storage backend or fetcher for gomods/athens. This offloads the work of building and maintaining the proxy itself. However, it would still have to store the mapping of package versions to IPFS hashes somewhere.

dennwc commented 5 years ago

Yes, storage backend might be even better integration point.

For hashes, it's seems like the simplest way will be to maintain a IPFS root for dependencies of a project. For example /project-org-hash/github.com/whyrusleeping/gx/v1.0.0/... or whatever works for Athens.

dennwc commented 5 years ago

@keks @Stebalien Do you have any comments on this? Will a proxy solve mentioned issues for you?