Open Stebalien opened 6 years ago
Your step 1 seems like it would be easiest to do if we use: https://github.com/whyrusleeping/gx/issues/151 It seems that explicitly relying on some known registry would be simpler to reason about than having random publishes of subpackages all over the place. Though, in the end, its really the same.
On step 2, if we're going to go all the way to having differently rewritten packages per package, we might as well just go all the way and not use hashes. Just ensure that there are no duplicates in the tree, and then write gx/ipfs/QmFoo/stuff
-> vendor/github.com/whatever/stuff
and be done with it all. This gets us quite a bit more potential for deduplication too. (granted, this gets us away from the really nice ability to have hashes in the stack dumps)
(Windows has hardlink support)
lol
In general, +1 to this. This is IMO the right direction. Some things to ensure though:
Your step 1 seems like it would be easiest to do if we use: #151 It seems that explicitly relying on some known registry would be simpler to reason about than having random publishes of subpackages all over the place. Though, in the end, its really the same.
I just want to be careful about centralizing. That is, we need to make sure that users can depend on packages from multiple repos.
On step 2, if we're going to go all the way to having differently rewritten packages per package, we might as well just go all the way and not use hashes. Just ensure that there are no duplicates in the tree, and then write gx/ipfs/QmFoo/stuff -> vendor/github.com/whatever/stuff and be done with it all. This gets us quite a bit more potential for deduplication too. (granted, this gets us away from the really nice ability to have hashes in the stack dumps)
Recall, we rewrite before building, after adding to IPFS. Whether or not we rewrite before building shouldn't change deduplication.
Every dependency gets tested with its selected dependency set before its 'done'. This can be made a little bit faster by detecting instances where we don't actually change the deps for a package.
:+1: We'll want some tooling that auto-tests the entire tree.
Ensure deps locked into github for intermediate packages don't get horribly out of date. Not sure the right way forward here that doesnt break gx semantics.
More bots and CI?
Recall, we rewrite before building, after adding to IPFS. Whether or not we rewrite before building shouldn't change deduplication.
Right, but rewriting at all breaks the CoW deduplication you suggested earlier. If we don't rewrite, then we don't use up any extra space.
Right, but rewriting at all breaks the CoW deduplication you suggested earlier. If we don't rewrite, then we don't use up any extra space.
Ah, do you mean any extra disk space in $GOPATH/src/gx/...
? Yes.
Unfortunately, it also means that we can't have duplicates which may be desirable in certain cases. However, step 2 doesn't really allow that either (unless we do something a bit more complicated and allow different versions for transitive dependencies).
Yeah, i've always found the multiple versions of the same package to be questionably useful. Nowadays i lean more towards disallowing that completely.
So, my concern is about packages like, e.g., some random crypto, math, hash, etc. library. That is, some entirely internal library that doesn't export any types. I wonder if there's any way to get this information from go. There should be.
I've discussed this with @whyrusleeping and we came up with a third option that we prefer: defer to the language's package manager whenever possible.
In this variant, gx would maintain a gx-lock.json
file, mapping packages (name, path, etc.) to hashes. To build on go, we would:
To update dependencies, we'd entirely defer to the languages package manager for dependency resolution. Once the package manager has figured out the right deps for us to use, we'd update any hashes as necessary. In practice, the user will:
gx sync
command (or something like that). This command update gx hashes as necessary, asking the user about each update (we could even provide diffs on demand).This gives us all the features we like (certified dependencies, package hashes in builds, source distribution over ipfs, etc.) without having to fight with package managers.
Note: This doesn't preclude us from introducing custom package managers for some languages where we feel that the existing ones don't cut it. However, it means that we can integrate better when they do.
Basically, we'd end up having two commands:
Basically, for me at least, the biggest time-sink is the lack of a dependency resolution system.
This line in https://github.com/whyrusleeping/gx/issues/200#issuecomment-412951672 made everything click for me about this issue, if you don't mind @Stebalien I'm adding it in the abstract of this issue, it's a great one-liner of what we're talking about (and trust me, it's really valuable as these are not simple subjects to grasp in the first read).
Go ahead.
Current status: #206 adds basic lockfile support, https://github.com/whyrusleeping/gx-go/pull/49 adds a command to create a lockfile.
Currently, we have one command: gx lock-install
(may be renamed to gx install
or something like that in the future). This command simply:
vendor
according to the lockfile.That will be the default "dev" setup.
There's also a gx-go gen-lock
command that creates the lockfile from a gx package tree (may want to rename this command).
Next steps:
Great work! I'll give these commands a try. Let me know if I can be of help.
Just FYI: For the time being I wrote a tool to help keep track of state during a complex gx-update. I am not sure how useful it will be once this is merged, but for what's it worth you can find it at: https://github.com/kevina/gx-update-helper.
I will take a closer look at this later to see how well it will work with what I had to do to get go-cid change (https://github.com/ipfs/go-cid/pull/71) in.
@Stebalien is gx sync
meant to replace gx-workspace
? Off hand I am not seeing how this will solve the problem of a complex gx update
that involves an API change and may also cause test cases to break. During the Cid API change I had to iterate several times to get it right (gx pubish, gx update, republish, reupdate, etc.). (Note that I wanted to publish so that the code would compile on the build servers to double check everything is okay.) I also discovered some bugs that I needed to fix and ended up compiling and testing using a partly rewritten tree.
There are two key proposals here:
gx-lock.json
file. This means we can, e.g., update go-cid in go-ipfs without modifying every package that happens to depend on something that uses go-cid.go get
and use vgo to do actual dependency resolution. gx sync
would regenerate the lockfile by looking at go.mod
files.For this change, we would have:
go.mod
files of the dependent packages as needed (fixing everything as we go along).go.mod
files, we would run gx sync
to update the package.json
files.Basically, this means we can be a bit more incremental instead of having to do everything all at once.
@Stebalien is this also going to solve the problem of being able to build without rewriting?
Yes. We'll actually have two build modes:
gx install
, we'll "install" the gxed dependencies into a vendor directory. In practice, we'll likely put the actual packages in a cache somewhere (~/.cache/gx/...
) and then symlink them in place (i.e., ~/.cache/gx/ipfs/Qm... -> ./vendor/github.com/a/b
. You should then be able to build with a normal go build
. This also makes testing out modifications to dependent packages really simple: we can just replace this symlink with one that points to the code you're working on.gx build --release
(or something like that). That will copy everything (including the dependencies) into a temporary directory, rewrite everything, and build. That way, we can still see package hashes in stack traces.
I'd like to propose a way forward to make gx nicer to use. Basically, for me at least, the biggest time-sink is the lack of a dependency resolution system. Step 1 aims at working around the limitations in gx-workspace and step 2 aims to integrate better with existing package managers.
Motivation
Ignore the below proposal and take a look at https://github.com/whyrusleeping/gx/issues/179#issuecomment-408243162
~Proposal~
Step 1
This step aims to solve issues 1-4.
In this step, we'd add a feature to
gx
that's basically like gx-workspace but works on published packages instead of repos. That is, you'd rungx update --everywhere SomeHash
to update the hash everywhere in the dependency tree. This tool would then:(@whyrusleeping has proposed this in the past)
One big downside is that we'd get a new package hash for every modified dependency. IPFS will deduplicate the files but this could still cause problems (users need to be very careful to pin everything).
The other downside is that we have no way to check these updates (other than to run the tests, which we should probably do). That is, packages can't specify semver constraints (yes, these don't guarantee anything but they can help).
Step 2
This step aims to solve issue 5 and the two issues introduced in step 1.
In this step, we'd switch to a package.json/package-lock.json setup following NPM's file formats as closely as possible. Ideally, we'd be able to make gx work with NPM packages without much trouble.
That is:
package.json
complete with semver, repos, etc.This gives us:
gx
with javascript for our builds while allowing everyone else to continue using NPM without any additional work (they'd just wouldn't get the same guarantees).However, this also has a drawback: We wouldn't be able to have a global
$GOPATH/src/gx
tree anymore because we'd need to rewrite each package's dependencies differently. This wasn't a problem in step 1 because we created entirely new packages when updating transitive dependencies.On the other hand, I'm not sure if this drawback is that bad.
gx-go rw
is now fast enough that we can rewrite dependencies on the fly (on build) as long as we have a central place to cache unrewritten gx packages. That is:cp -rl
), into a temporary GOPATH. This is a really fast operation.gx-go rw
(this doesn't modify files in-place so the hardlinking won't be a problem).We can even do this on Windows (Windows has hardlink support).
Note: we wouldn't have to rewrite all of our packages all at once, we'd just have to make sure to do so from the top down.