rust-lang / wg-cargo-std-aware

Repo for working on "std aware cargo"
133 stars 8 forks source link

Support `cargo vendor` #23

Open ehuss opened 4 years ago

ehuss commented 4 years ago

The cargo vendor command currently does not know about std dependencies. Will need to figure out how to handle the root dependencies, since they are special.

Source replacement may also be broken, I haven't tried it.

Gankra commented 3 years ago

Firefox is running into this for tsan CI

I've currently hacked up a script that can weave in the std tomls to a project's vendor, but yeah it'd be nice if this was handled automatically: https://gist.github.com/Gankra/6a025b5cce9d9b047e46a5caeded3050

alexcrichton commented 3 years ago

One thing to consider for this is that vendoring the standard library's dependencies effectively locks you to a compiler version since the dependencies are likely to drift over time. This may or may not be something that Cargo wants to provide a first-class warning/error about rather than a bland "couldn't find crate version in registry" error.

In theory this integration wouldn't be too too hard though, mostly just loading up the Resolve for the standard library, around here-ish

ehuss commented 3 years ago

Copying my comment from Zulip: We have recently discussed adding all the dependencies to the rust-src component for other reasons (matklad was interested in rust-analyzer support). That would be a potential solution of just not changing cargo vendor, and keeping the standard library in sync with the compiler.

I'm uncertain how we can wire that in to the resolver, though. It may be tricky to use source replacement or paths overrides (they cannot currently be customized independently). An option would be to add patches, but it doesn't know what to patch until after it loads the resolver. Another option would be to surgically modify the Dependency information to be path dependencies when the standard library workspace is loaded. None of those sound terribly appealing to me, though.

alexcrichton commented 3 years ago

Oh actually I think that's a great idea to inclue the dependencies in rust-src! I think that adding patch entries for all members in the vendor directory of rust-src should work perhaps? I don't think there should be much downside to just iterating through the vendor directory and adding a patch-per-crate. We would want to make sure that any "patch not used" warnings were suppressed as well.

Gankra commented 3 years ago

One thing to consider for this is that vendoring the standard library's dependencies effectively locks you to a compiler version since the dependencies are likely to drift over time. This may or may not be something that Cargo wants to provide a first-class warning/error about rather than a bland "couldn't find crate version in registry" error.

This isn't an issue for firefox's usecase in particular. We only need these vendors for our tsan CI, and that uses pinned versions of all the toolchains. Although it is a problem if we want these vendoring checked in and we want people to be able to bump non-std dependencies without having the nightly we use for CI. So it's possible we want to vendor live in CI as part of the build. Or we need to carefully identify deps which are only used by std so we can preserve them across non-std-vendors.

Certainly if the dependencies were just part of rust-src and automagically weaved into the build, that would solve the issue. No need to worry about weaving together different "kinds" of vendor or anything else.

alexcrichton commented 3 years ago

Yeah pinning versions is too wonky and restrictive, so I think including the dependencies in rust-src is a great solution.

Gankra commented 3 years ago

Anything I can do to help the plan along?

ehuss commented 3 years ago

I think the first step is getting the dependencies included in the rust-src component (the relevant code is here). Unfortunately, cargo vendor doesn't have a way to only vendor a subset of packages. I'm not sure how to address that. Some ideas:

None of these options seem easy.

Gankra commented 3 years ago

I believe the approach I'm using to test out tsan builds right now would work for building "subset vendors": https://gist.github.com/Gankra/6a025b5cce9d9b047e46a5caeded3050

Essentially just temporarily copying the root lockfile to each individual library and then cargo vendor -s'ing them in. Vendor doesn't pull things in just because they're in the lockfile, but it will respect the constraints it applies to the dependency graph.

This would be a fairly maintainable hack as you would only need to directly -s in std and test (and maybe panic_abort for good measure?). Everything else is transitively pulled in by those ones. (hmm, maybe only test is actually needed since it explicitly depends on std and both the panic crates?)

What do you think of shipping that?

alexcrichton commented 3 years ago

I think that's probably bordering on the 4th bullet of @ehuss's comment above, but I think something along those lines may be workable. As we do for build-std in general we're just trying to load one "root" package and vendor that, so the filtering in cargo-vendor probably isn't the hardest thing in the world. Once you're inside cargo-vendor you've got access to lots of fun internals of Cargo as well which the CLI may not expose.

Gankra commented 3 years ago

Would -Zbuild-std also need to be reworked, or would the existence of a vendoring .cargo/config at the root of the rust-src directory automagically do the right thing?

alexcrichton commented 3 years ago

We would need to update Cargo's implementation of -Zbuild-std as well to automatically inject [patch] entries for crates found in rust-src

Gankra commented 3 years ago

Ok I'll be working on this for the next week or two, hopefully won't be too messy.

Gankra commented 3 years ago

Ok, all the upstream work is done, this should Just Work as soon as a new nightly is cut with the latest cargo changes! (Does that need to be done explicitly, or does that just happen?)

Gankra commented 3 years ago

Oh nice, ehuss is way ahead of me: https://github.com/rust-lang/rust/pull/78972

ehuss commented 3 years ago

This should be fixed now, thanks @Gankra!

Gankra commented 3 years ago

Confirmed, my test vendor project now builds, time to ship in firefox CI!

On Sat, Nov 14, 2020 at 10:50 AM Eric Huss notifications@github.com wrote:

Closed #23 https://github.com/rust-lang/wg-cargo-std-aware/issues/23.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/wg-cargo-std-aware/issues/23#event-3996533311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIVRYFR62VWLVS25VY6B6TSP2RMTANCNFSM4IUM3VWA .

Gankra commented 3 years ago

(And now running without any issue in our CI! 🎊)

ehuss commented 3 years ago

Reopening due to https://github.com/rust-lang/cargo/pull/8968

ehuss commented 3 years ago

Some notes about the problems that cropped up:

Initial implementation: https://github.com/rust-lang/cargo/pull/8834 and https://github.com/rust-lang/rust/pull/78790 Reverted in https://github.com/rust-lang/cargo/pull/8968 and https://github.com/rust-lang/rust/pull/79571 and https://github.com/rust-lang/rust/pull/80082

pca006132 commented 3 years ago

What is the status of this issue? We are planning to move some embedded rust project from cargo-xbuild to -Z build-std. We currently use Nix to build the crates, so we need a way to let cargo vendor find out std dependencies, or we cannot build the project with Nix as it does not allow internet access during the build.

Gankra commented 3 years ago

If you really need support now, then you can pin to nightly-2020-11-14 which had my WIP support built in. I intend to come back and do this properly in the next month or two, but there's a lot of hard problems to solve.

pca006132 commented 3 years ago

If you really need support now, then you can pin to nightly-2020-11-14 which had my WIP support built in. I intend to come back and do this properly in the next month or two, but there's a lot of hard problems to solve.

OK, good to know that this would be supported soon. I think I would just keep using cargo-xbuild at the moment, as I need some changes in compiler-builtins that just shipped with recent nightly.

Ericson2314 commented 3 years ago

@pca006132 BTW I got crate2nix doing cross compilation and hope it might work with -Z build-std next, which perhaps could avoid needing cargo vendor to work at all?

Ericson2314 commented 3 years ago

@ehuss in https://github.com/rust-lang/cargo/issues/8962#issuecomment-742164015 you write:

I can't think of a simple way to fix that. Vendoring should probably use source replacement (not patches), but source replacement is a global setting that can't be easily enabled just for resolving std.

What exactly do you mean? It sounds exactly right to me to separate the standard libraries crates, which are patched in, or a separate source, from their crates.io dependencies, which are merely mirrored with "local registry" or similar.

ehuss commented 3 years ago

Using patches to emulate vendoring is a bit of a hack, as evidenced by the problems it caused. I think it would be better if vendoring std used the normal vendoring mechanism, but I'm not sure how feasible that is (it seems hard). It may be possible to get patches to a state where they work, but I think it will be difficult, and potentially fragile.

Ericson2314 commented 3 years ago

Thanks. I agree vendoring for vendoring sounds better. So what hard about the normal vendoring, and might https://github.com/rust-lang/wg-cargo-std-aware/issues/64 help?

ehuss commented 3 years ago

Vendoring is a global setting, it can't be turned on just for the std workspace. I don't really follow #64, but I don't think it is feasible.

Ericson2314 commented 3 years ago

@ehuss is the a problem for turning it on for both workspaces? Isn't vendoring transparent so it need not affect the lockfile?

ehuss commented 3 years ago

Sorry, I'm not sure I understand the question. The issue with using source replacement is that it is a global config setting in Cargo which would affect both workspaces. One possible route is to change this line to create some kind of clone of the config, and modify the config to enable vendoring for just the std workspace. However, Config isn't really designed to be cloned, and that might be tricky to do efficiently. I'm also uncertain if that might trigger problems elsewhere (for example this line where the PackagetSets from the two workspaces are merged).

Ericson2314 commented 3 years ago

@ehuss Sorry, let me try to be clearer: what's the problem for turning on the vendoring or both workspaces? If cfg_if or hash_brown is needed twice, it just works right? And if something isn't in the vendor it falls back on crates.io, right? There's probably some detail of source replacement I'm missing, but based on what I know (it's transparent to resolution/lockfiles) turning it on globally is harmless.

ehuss commented 3 years ago

If both workspaces are vendored into the same directory, that would tie it to a specific version of rustc. Switching versions of rustc would need different dependencies that don't appear in the vendor directory, which would lead to an error.

Ericson2314 commented 3 years ago

@ehuss isn't the decision here about when/where to use the vendor, not when to create it? When doing -Z build-std it uses whatever replacement source the crate provides, globally but conditionally. The fact that this vendor isn't always the same I thought was OK, because source replacement is transparent, so the project workspace won't need to care. The std workspace of course will care what is and isn't there since it has no crates.io to fall back on, but it varies with the vendor so I think it would be OK.

ehuss commented 3 years ago

Sorry, I'm having a hard time understanding what you are saying.

isn't the decision here about when/where to use the vendor, not when to create it?

The intent here is that the std dependencies are always vendored (located in $SYROOT/.../lib/rustlib/src/rust/vendor/...). The issue is that it is not easy to have Cargo conditionally use vendoring for one workspace versus not in another (as described above in https://github.com/rust-lang/wg-cargo-std-aware/issues/23#issuecomment-765474842), because it is a global setting (global in the sense of within the Cargo process, since Config is essentially a global).

When doing -Z build-std it uses whatever replacement source the crate provides, globally but conditionally.

Crates don't provide source replacement. Source replacement is driven by the configuration files, which can be local or global relative to the workspace. If you mean conditionally within Cargo, that's what I explained above that the global Config is not designed to be conditional or easily cloned.

The fact that this vendor isn't always the same I thought was OK, because source replacement is transparent

I don't know what is meant by "transparent" here. It is transparent in the sense that vendoring shouldn't behave any differently from using the registry sources. It is not transparent in the sense that all dependencies must be vendored, otherwise it is an error (it does not fall back to anything).

Ericson2314 commented 3 years ago

It is not transparent in the sense that all dependencies must be vendored, otherwise it is an error (it does not fall back to anything).

Ah OK so this is what is tripping us up. I saw https://doc.rust-lang.org/cargo/reference/source-replacement.html#local-registry-sources which says it's OK to have this subset mirror.

Now, supposing that either the book is correct, or the book can be made correct:

It is transparent in the sense that vendoring shouldn't behave any differently from using the registry sources.

Whew :)

Crates don't provide source replacement. Source replacement is driven by the configuration files, which can be local or global relative to the workspace.

I think thats fine. By conditionally, I mean -Z build-std and information from the current compiler / rust-src component would be mixed in at configuration loading time, just like information from multiple config files is already. Everything together would create the single global Config, as today.

Independently from that, -Z build-std would also influence the std workspace as today. So the std workspace would go looking some compiler-specific crate source code, and independently by the config-time mechnaism, the otherwise-unonbtainable source containing those crate source code would be replaced in.

ehuss commented 3 years ago

Ah OK so this is what is tripping us up. I saw https://doc.rust-lang.org/cargo/reference/source-replacement.html#local-registry-sources which says it's OK to have this subset mirror.

By "subset" it is saying that you don't have to mirror all of crates.io locally. It doesn't mean that it will allow something to be missing from the subset. Source replacement works as a "full replacement", not as an overlay.

Also, a local-registry is not what we are talking about here. The vendoring mechanism we are talking about is a directory-source (ala cargo vendor).

Ericson2314 commented 3 years ago

@ehuss OK thanks for the clarifications. Two more questions:

  1. @Gankra's PR both changed the rust-src component format along with trying not make cargo vendor, work right?
  2. The regression just when using cargo vendor, or even without using it? I thought it was the latter, and due to the way the new rust-src was patched in.

My previous comments frankly forgot about the original issue --- cargo vendor --- and were all about what went wrong in the regression case, which as mentioned I thought happened regardless of whether cargo vendor was actually attempted.

ehuss commented 3 years ago

The original changes made Cargo treat the standard library dependencies as always vendored (with -Z build-std), whether or not there is a local source replacement. So it is independent if you run cargo vendor locally.

There were multiple regressions as detailed in https://github.com/rust-lang/wg-cargo-std-aware/issues/23#issuecomment-748253097. These happened with all -Z build-std builds, and also affected distribution builds of rustc itself.

Gankra commented 3 years ago

Hmm ok so that was all a bunch of confusion and didn't actually help us figure out the "right" solution, eh?

@ehuss this stuff is really out of my league -- have you had a chance to mull over the possible solutions?

As far as the partial vendoring thing goes, I'm starting to lean towards just giving up on partial vendoring and just grabbing the whole rustc graph. AIUI that should be fairly unproblematic and also not need to pull in the really heavy stuff like llvm?

Making patching in the vendor totally seamless is the nasty bit I really can't wrap my head around. Is it potentially as simple as combining the WIP patches you and Alex posted?

If we don't have any clear solutions, firefox does have the nuclear option of maintaining patches for cargo for our builds, since my solution did work fine for us. But obviously that's a gross mess we'd rather not maintain.

cuviper commented 3 years ago

As far as the partial vendoring thing goes, I'm starting to lean towards just giving up on partial vendoring and just grabbing the whole rustc graph. AIUI that should be fairly unproblematic and also not need to pull in the really heavy stuff like llvm?

There's still some heavy stuff in there -- you can see that in the full rustc-1.50.0-src.tar.xz in vendor/:

$ dua | tail
   7.69 MB jemalloc-sys
   7.88 MB winapi
   9.75 MB vte
  11.63 MB libgit2-sys
  12.16 MB libnghttp2-sys
  25.01 MB curl-sys
  27.65 MB openssl-src
  53.83 MB winapi-i686-pc-windows-gnu
  56.12 MB winapi-x86_64-pc-windows-gnu
 382.91 MB total
raphaelcohn commented 3 years ago

This is clearly hard to fix - and after my error report in #65 I discovered how hard. @ehuss is completely right about the collision of dependencies in a workspace when vendored - my workspace was using hashbrown 0.9.1, yet the std dependencies needed 0.9.0 and bang! conflict. Which surprised me, until I recalled the point of a workspace is to provide consistent versions.

That means in practice, either I have to track the exact same matching package versions as std (not a particularly easy thing to simply know without a bit of digging) or give up building std. Or provide a sensible mechanism for 'overriding' package version choices in std. Also not nice. Or providing the ideal of cascading workspaces, which works, but is complex. Since std does not expose its private dependencies (eg hashbrown), this is viable and defensible - but tough.

Along with several common packages not playing nicely with cross compilation (libc and cc, I'm looking at you) without wrapper scripts setting up env vars et al, this is just a bit too hard. Giving up for now, personally. All I wanted to do was find a better way to build linux x86_64 musl targets with a few more target features (eg avx512 stuff)...

Ericson2314 commented 3 years ago

@Gankra

Hmm ok so that was all a bunch of confusion and didn't actually help us figure out the "right" solution, eh?

Yes, haha.

But, still believe https://github.com/rust-lang/wg-cargo-std-aware/issues/64 is a solution here

Which surprised me, until I recalled the point of a workspace is to provide consistent versions.

Heh, well !64 would at least make us have to face down this problem regardless of vendoring. Without knowing all the details, this seems wrong. Per https://github.com/rust-lang/rfcs/blob/master/text/1977-public-private-dependencies.md std should have private dependencies and that is the basis by which this is allowed. Any vendoring, what's in/not-in the workspace, etc. should be completely transparent and independent of the "platonic" build plan.

baloo commented 2 years ago

For nix users where cargo-vendor is the only option available, I found a "workaround" in downloading rust's compiler dependencies manually and merging them. The result can be found here: https://github.com/baloo/nixos-firmware/commit/e108c501fecdd0b2bde367e8e26347574105b454

kgrech commented 1 year ago

The easiest workaround is probably something like this:

RUST_DIR="$(rustc --print=sysroot)"

# Copy the Cargo.lock for Rust to place the `cargo vendor` will see
cp "$RUST_DIR"/lib/rustlib/src/rust/Cargo.lock "$RUST_DIR"/lib/rustlib/src/rust/library/test/

cargo vendor \
    --manifest-path Cargo.toml \
    --sync "$RUST_DIR"/lib/rustlib/src/rust/library/test/Cargo.toml \
    cargo_vendor

# Remove copied file
rm "$RUST_DIR"/lib/rustlib/src/rust/library/test/Cargo.lock
Dev380 commented 10 months ago

@kgrech sorry if this is a dumb question, but how would I make cargo use the vendored stdlib when building? cargo vendor says

[source.crates-io]
replace-with = "vendored-sources"

[source.vendored-sources]
directory = "vendor"

but obviously it's not from crates.io and I don't know the registry to tell it to replace

kgrech commented 10 months ago

@Dev380 as far as I remember, the std library sources need to be installed using rustup. I think they never fetched from crates.io. it is only for dependencies of std lib, but not stdlib itself.

Not sure I fully understand you question, but you just put what cargo vendor prints to your configuration. vendor is just a folder created by cargo vendor