Stebalien commented 6 years ago

I'd like to propose a way forward to make gx nicer to use. Basically, for me at least, the biggest time-sink is the lack of a dependency resolution system. Step 1 aims at working around the limitations in gx-workspace and step 2 aims to integrate better with existing package managers.

Motivation

Every dependency must update shared transitive dependencies in lock-step.
Updating a dependency deep in the dependency tree is really painful.
Updating dependencies often means firing off a bunch of PRs to get updated package.json files merged.
Dependencies not stored in gx are painful to use.
Using gx packages with other package managers can be painful.

Ignore the below proposal and take a look at https://github.com/whyrusleeping/gx/issues/179#issuecomment-408243162

~Proposal~

Step 1

This step aims to solve issues 1-4.

In this step, we'd add a feature to gx that's basically like gx-workspace but works on published packages instead of repos. That is, you'd run gx update --everywhere SomeHash to update the hash everywhere in the dependency tree. This tool would then:

Modify the package.json files in the published package.
Re-publish and get a new hash.

(@whyrusleeping has proposed this in the past)

One big downside is that we'd get a new package hash for every modified dependency. IPFS will deduplicate the files but this could still cause problems (users need to be very careful to pin everything).

The other downside is that we have no way to check these updates (other than to run the tests, which we should probably do). That is, packages can't specify semver constraints (yes, these don't guarantee anything but they can help).

Step 2

This step aims to solve issue 5 and the two issues introduced in step 1.

In this step, we'd switch to a package.json/package-lock.json setup following NPM's file formats as closely as possible. Ideally, we'd be able to make gx work with NPM packages without much trouble.

That is:

Add a generated package-lock.json file. We can use this file to lock in packages to specific IPFS hashes (and specific git commits for compatibility). This file MUST be checked into version control. Note: This fill will list every hash of every transitive dependency. This means we can update a transitive dependency without modifying the dependency.
Use the NPM dependency list format in package.json complete with semver, repos, etc.

This gives us:

Semver. That is, we can check semver versions when updating transitive dependencies and solving transitive dependency conflicts.
Fixes the downside from step 1. That is, we don't have to modify dependencies to update transitive dependencies.
Potentially allows us to integrate with tools like NPM. Ideally, we'd be able to use gx with javascript for our builds while allowing everyone else to continue using NPM without any additional work (they'd just wouldn't get the same guarantees).
Makes it easier to integrate with tools like go's dep. We should be able to autogenerate Godep.toml/Godep.lock files from our package.json/package-lock.json files.

However, this also has a drawback: We wouldn't be able to have a global $GOPATH/src/gx tree anymore because we'd need to rewrite each package's dependencies differently. This wasn't a problem in step 1 because we created entirely new packages when updating transitive dependencies.

On the other hand, I'm not sure if this drawback is that bad. gx-go rw is now fast enough that we can rewrite dependencies on the fly (on build) as long as we have a central place to cache unrewritten gx packages. That is:

Hardlink all files, gx and package (cp -rl), into a temporary GOPATH. This is a really fast operation.
Run gx-go rw (this doesn't modify files in-place so the hardlinking won't be a problem).
Build.

We can even do this on Windows (Windows has hardlink support).

Note: we wouldn't have to rewrite all of our packages all at once, we'd just have to make sure to do so from the top down.

whyrusleeping commented 6 years ago

Your step 1 seems like it would be easiest to do if we use: https://github.com/whyrusleeping/gx/issues/151 It seems that explicitly relying on some known registry would be simpler to reason about than having random publishes of subpackages all over the place. Though, in the end, its really the same.

On step 2, if we're going to go all the way to having differently rewritten packages per package, we might as well just go all the way and not use hashes. Just ensure that there are no duplicates in the tree, and then write gx/ipfs/QmFoo/stuff -> vendor/github.com/whatever/stuff and be done with it all. This gets us quite a bit more potential for deduplication too. (granted, this gets us away from the really nice ability to have hashes in the stack dumps)

(Windows has hardlink support)

lol

whyrusleeping commented 6 years ago

In general, +1 to this. This is IMO the right direction. Some things to ensure though:

Every dependency gets tested with its selected dependency set before its 'done'. This can be made a little bit faster by detecting instances where we don't actually change the deps for a package.
Ensure deps locked into github for intermediate packages don't get horribly out of date. Not sure the right way forward here that doesnt break gx semantics

Stebalien commented 6 years ago

Your step 1 seems like it would be easiest to do if we use: #151 It seems that explicitly relying on some known registry would be simpler to reason about than having random publishes of subpackages all over the place. Though, in the end, its really the same.

I just want to be careful about centralizing. That is, we need to make sure that users can depend on packages from multiple repos.

On step 2, if we're going to go all the way to having differently rewritten packages per package, we might as well just go all the way and not use hashes. Just ensure that there are no duplicates in the tree, and then write gx/ipfs/QmFoo/stuff -> vendor/github.com/whatever/stuff and be done with it all. This gets us quite a bit more potential for deduplication too. (granted, this gets us away from the really nice ability to have hashes in the stack dumps)

Recall, we rewrite before building, after adding to IPFS. Whether or not we rewrite before building shouldn't change deduplication.

Every dependency gets tested with its selected dependency set before its 'done'. This can be made a little bit faster by detecting instances where we don't actually change the deps for a package.

:+1: We'll want some tooling that auto-tests the entire tree.

Ensure deps locked into github for intermediate packages don't get horribly out of date. Not sure the right way forward here that doesnt break gx semantics.

More bots and CI?

whyrusleeping commented 6 years ago

Recall, we rewrite before building, after adding to IPFS. Whether or not we rewrite before building shouldn't change deduplication.

Right, but rewriting at all breaks the CoW deduplication you suggested earlier. If we don't rewrite, then we don't use up any extra space.

Stebalien commented 6 years ago

Right, but rewriting at all breaks the CoW deduplication you suggested earlier. If we don't rewrite, then we don't use up any extra space.

Ah, do you mean any extra disk space in $GOPATH/src/gx/...? Yes.

Unfortunately, it also means that we can't have duplicates which may be desirable in certain cases. However, step 2 doesn't really allow that either (unless we do something a bit more complicated and allow different versions for transitive dependencies).

whyrusleeping commented 6 years ago

Yeah, i've always found the multiple versions of the same package to be questionably useful. Nowadays i lean more towards disallowing that completely.

Stebalien commented 6 years ago

So, my concern is about packages like, e.g., some random crypto, math, hash, etc. library. That is, some entirely internal library that doesn't export any types. I wonder if there's any way to get this information from go. There should be.

Stebalien commented 6 years ago

I've discussed this with @whyrusleeping and we came up with a third option that we prefer: defer to the language's package manager whenever possible.

In this variant, gx would maintain a gx-lock.json file, mapping packages (name, path, etc.) to hashes. To build on go, we would:

Copy the package to a temporary directory.
Install the gx deps in a vendor directory.
Rewrite everything.
Build.

To update dependencies, we'd entirely defer to the languages package manager for dependency resolution. Once the package manager has figured out the right deps for us to use, we'd update any hashes as necessary. In practice, the user will:

Change the language-specific package metadata files.
Run a gx sync command (or something like that). This command update gx hashes as necessary, asking the user about each update (we could even provide diffs on demand).

This gives us all the features we like (certified dependencies, package hashes in builds, source distribution over ipfs, etc.) without having to fight with package managers.

Note: This doesn't preclude us from introducing custom package managers for some languages where we feel that the existing ones don't cut it. However, it means that we can integrate better when they do.

Stebalien commented 6 years ago

Basically, we'd end up having two commands:

gx build
gx sync

schomatis commented 6 years ago

Basically, for me at least, the biggest time-sink is the lack of a dependency resolution system.

This line in https://github.com/whyrusleeping/gx/issues/200#issuecomment-412951672 made everything click for me about this issue, if you don't mind @Stebalien I'm adding it in the abstract of this issue, it's a great one-liner of what we're talking about (and trust me, it's really valuable as these are not simple subjects to grasp in the first read).

Stebalien commented 6 years ago

Go ahead.

Stebalien commented 6 years ago

Current status: #206 adds basic lockfile support, https://github.com/whyrusleeping/gx-go/pull/49 adds a command to create a lockfile.

Currently, we have one command: gx lock-install (may be renamed to gx install or something like that in the future). This command simply:

Fetches the packages into a cache directory (somewhere).
Symlinks them into vendor according to the lockfile.

That will be the default "dev" setup.

There's also a gx-go gen-lock command that creates the lockfile from a gx package tree (may want to rename this command).

Next steps:

Allow syncing a vendor directory into the lockfile. That is, a user should be able to drop a package in the vendor directory, run a command, and have this package inserted into the lockfile.
Allow syncing vgo packages to the lockfile.
A release build command that does rewrite paths (in a temporary directory).

schomatis commented 6 years ago

Great work! I'll give these commands a try. Let me know if I can be of help.

kevina commented 5 years ago

Just FYI: For the time being I wrote a tool to help keep track of state during a complex gx-update. I am not sure how useful it will be once this is merged, but for what's it worth you can find it at: https://github.com/kevina/gx-update-helper.

I will take a closer look at this later to see how well it will work with what I had to do to get go-cid change (https://github.com/ipfs/go-cid/pull/71) in.

kevina commented 5 years ago

@Stebalien is gx sync meant to replace gx-workspace? Off hand I am not seeing how this will solve the problem of a complex gx update that involves an API change and may also cause test cases to break. During the Cid API change I had to iterate several times to get it right (gx pubish, gx update, republish, reupdate, etc.). (Note that I wanted to publish so that the code would compile on the build servers to double check everything is okay.) I also discovered some bugs that I needed to fix and ended up compiling and testing using a partly rewritten tree.

Stebalien commented 5 years ago

There are two key proposals here:

Instead of having packages use their dependencies package.json files to determine transitive dependency versions, packages can "lock in" all transitive dependency versions in their gx-lock.json file. This means we can, e.g., update go-cid in go-ipfs without modifying every package that happens to depend on something that uses go-cid.
Use vgo (go.mod) instead of go get and use vgo to do actual dependency resolution. gx sync would regenerate the lockfile by looking at go.mod files.

For this change, we would have:

Made the change to go-cid, merged it, released a major version.
Updated the go.mod files of the dependent packages as needed (fixing everything as we go along).
As we update the go.mod files, we would run gx sync to update the package.json files.

Basically, this means we can be a bit more incremental instead of having to do everything all at once.

kevina commented 5 years ago

@Stebalien is this also going to solve the problem of being able to build without rewriting?

Stebalien commented 5 years ago

Yes. We'll actually have two build modes:

If you run gx install, we'll "install" the gxed dependencies into a vendor directory. In practice, we'll likely put the actual packages in a cache somewhere (~/.cache/gx/...) and then symlink them in place (i.e., ~/.cache/gx/ipfs/Qm... -> ./vendor/github.com/a/b . You should then be able to build with a normal go build. This also makes testing out modifications to dependent packages really simple: we can just replace this symlink with one that points to the code you're working on.
If you want to produce a "release" build, you'll be able to run gx build --release (or something like that). That will copy everything (including the dependencies) into a temporary directory, rewrite everything, and build. That way, we can still see package hashes in stack traces.

whyrusleeping / gx

Making GX pleasant to use #179

Motivation

~Proposal~

Step 1

Step 2