rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.75k stars 2.42k forks source link

Reconsider RUSTFLAGS artifact caching. #8716

Open ehuss opened 4 years ago

ehuss commented 4 years ago

When RUSTFLAGS is changed between invocations, this can trigger a complete rebuild. This can be annoying, and the only workaround is to use a separate target directory for each RUSTFLAGS setting. I think we should revisit how this works, and see if there is a way to address the problems.

Motivation:

History:

Solutions? I am uncertain what a good approach would be. Some rough ideas:

jyn514 commented 4 years ago

Provide first-class support for some of the things that people use RUSTFLAGS for today.

My main pain point with RUSTFLAGS is https://github.com/xd009642/tarpaulin/issues/416#issuecomment-623150687, which has some niche flags and wouldn't be helped much by this. AFAIK all of those flags are codegen/linking related though, and don't affect .rlibs.

Add a mechanism for specifying RUSTFLAGS so that the user can decide whether or not it gets hashed.

Maybe there can be a way for rustc itself to tell cargo whether an option requires a re-compile? Something like rustc -Z needs-recompile "$RUSTFLAGS". I agree this shouldn't be left up to the user, my main complaint about make other build systems is they require you to specify build dependencies yourself.

I don't know enough about -C metadata and -C extra-filename to have an opinion.

alexcrichton commented 4 years ago

However, this could still cause problems with reproducible builds

I believe this does because the filenames of object files show up as the codegen unit names in dwarf debug info (and probably PDB?). I suspect that everything else in the binary will be the same, but builds differing only in extra-filename are likely to still produce different binaries.

Add a mechanism for specifying RUSTFLAGS so that the user can decide whether or not it gets hashed

I think this will work but I don't think it'll be too feasible if the default is to not hash RUSTFLAGS. The pain of rebuilding too much you only discover after-the-fact that you should have been hashing rustflags all along. I don't think we can hash-by-default though since it breaks the use cases you've mentioned.

Provide first-class support for some of the things that people use RUSTFLAGS for today

I do think a --reproducible flag or --pgo is something we should add to Cargo no matter what, regardless of how this RUSTFLAGS issue is settled.


Another possible solution could be adding a new variable like RUSTFLAGS_HASHED? That way if tooling sets it the weird env name is fine (maybe the case for tarpaulin?)

jyn514 commented 1 year ago

Maybe there can be a way for rustc itself to tell cargo whether an option requires a re-compile? Something like rustc -Z needs-recompile "$RUSTFLAGS".

Another option is for cargo to inspect RUSTFLAGS and avoid rebuilding for flags it knows can only affect leaf crates or diagnostics, like -C link-args or -W lint flags. I know cargo has a policy of not inspecting RUSTFLAGS but I don't know the history or rationale for that decision.

stephanemagnenat commented 1 year ago

A note from a user: this issue also appears when one wants to build multiple target flavors of WASM, for example to have two versions, one basic and one with -C target-feature=+simd128 to be imported on runtime depending on the browser support.

Tokarak commented 1 year ago

Maybe there can be a way for rustc itself to tell cargo whether an option requires a re-compile? Something like rustc -Z needs-recompile "$RUSTFLAGS".

Another option is for cargo to inspect RUSTFLAGS and avoid rebuilding for flags it knows can only affect leaf crates or diagnostics, like -C link-args or -W lint flags. I know cargo has a policy of not inspecting RUSTFLAGS but I don't know the history or rationale for that decision.

I agree that something like this should be done. As https://github.com/rust-lang/rust/issues/110654 points out, current behaviour can be annoying.

First of all, one of the main focuses on the PR — changes in rustflags should not invalidate the cache. Bullet point 1 seems like the only way to solve it.

Go back to hashing RUSTFLAGS, but only in the -C extra-filename, and not -C metadata. This should (probably) avoid the PGO problems, since the symbols won't change. However, this could still cause problems with reproducible builds if the rlib filename somehow leaks into the resulting binary (I don't have a reason to think it does, but it seems risky).

As pointed out:

However, this could still cause problems with reproducible builds

I believe this does because the filenames of object files show up as the codegen unit names in dwarf debug info (and probably PDB?). I suspect that everything else in the binary will be the same, but builds differing only in extra-filename are likely to still produce different binaries.

There seems to be no road forward without refactoring something or adding something new - but what? For over 99% of users (it seems like just those who need reproducible builds AND care about the filenames of the units) the solution is one small tweak away. Can we expect the user to know if they care about filenames? Maybe this flag can propagate from the dependencies to the leaves?

What if units were simlinked (or hardlinked) to something not including the extra-filename? The links can be overwriten at neglible cost, while debuggers get a constant filename to work with. Maybe I'm missing the point of extra-filename.

Secondly, not all RUSTFLAGS values need to be fingerprinted.

Another option is for cargo to inspect RUSTFLAGS and avoid rebuilding for flags it knows can only affect leaf crates or diagnostics, like -C link-args or -W lint flags.

The great thing about this is that building a hashing blacklist is incremental and non-breaking. Even if some values are forgotten, it would still be better than now. It does not solve the problem for when something rebuilds, but it does make rebuilds happen less often. I see no downsides.

epage commented 1 year ago

I know cargo has a policy of not inspecting RUSTFLAGS but I don't know the history or rationale for that decision.

CLI parsing is complex and having cargo parse another program's CLI means cargo needs to duplicate the knowledge of that other programs CLI. Trying to parse a subset of known flags has its own caveats that for when it falls down flat. Compared to rustup, we at least have the advantage that these CLIs ship together.

epage commented 1 year ago

Another angle to this is that for #5931, I'd like us to avoid dealing with UX problems of the cache being poisoned, so I'd favor accidentally including too much in the fingerprint than not enough.

Tokarak commented 1 year ago

I know cargo has a policy of not inspecting RUSTFLAGS but I don't know the history or rationale for that decision.

CLI parsing is complex and having cargo parse another program's CLI means cargo needs to duplicate the knowledge of that other programs CLI. Trying to parse a subset of known flags has its own caveats that for when it falls down flat. Compared to rustup, we at least have the advantage that these CLIs ship together.

Does this mean that currently changing the order of rustflags, or changing -C to --codegen, etc., will change the fingerprint?

epage commented 1 year ago

Does this mean that currently changing the order of rustflags, or changing -C to --codegen, etc., will change the fingerprint?

From my reading of the code, yes, superficial changes like that will cause rebuilds.

One of the potential solutions in the original issue was to provide first-class support for some of the features. #12115 is an example of doing that and came from the pain I had from trying to use RUSTFLAGS for lint levels. Granted, that protects against order changes and -A vs --allow changes but we still fingerprint lints. For some, they would rather not. For others, they would expect that adding a new lint would fail the build if tit should trigger so we need to force a rebuild. One nice aspect is that it isn't recursive so for #5931 it won't cause extra dependencies to rebuild if you change your lint levels.

RobJellinghaus commented 1 year ago

Given that the current semantics of RUSTFLAGS is "this affects every crate you ever build anywhere" down to the leafmost level, it's hard to see how to change that without breaking everyone.

But my question is, isn't the current behavior the biggest possible hammer? Could a solution involve providing a limited-scope version of RUSTFLAGS? For example, if you wanted your RUSTFLAGS setting to affect only your mybinary crate, perhaps something like RUSTFLAGS_CRATE_MYBINARY to set flags for that crate only? All other crates would ignore that environment variable.

I don't know whether this scales to many of the RUSTFLAGS scenarios, and I'm handwaving about things like multiple registries, but certainly for anything affecting rootmost binary linking, it seems viable?

Edit: Thinking about it more, this is architecturally orthogonal to the issue of not wanting some particular flags to affect the symbol hashes of the crate, because that's an issue for the root binary. So this idea maybe helps with reducing the "blast zone" of RUSTFLAGS, but not the hash stability issue.

RobJellinghaus commented 1 year ago

Another variant of the above idea would be RUSTFLAGS_BINARY which would be RUSTFLAGS that apply to binary crates only. This would cover cases like PGO where you're just altering the final link.

We could potentially even say that ordinary RUSTFLAGS are included in the metadata/hash again, but that RUSTFLAGS_BINARY are not, and are treated like PGO flags today. This would let us get back to side-by-side cacheability for the relatively few cases where you really need RUSTFLAGS that apply to every crate you're building, while preserving the current property that PGO flags don't affect the metadata/hash.

I know it's another knob and we hate those, but really we need something less pervasive than RUSTFLAGS don't we?

weihanglo commented 1 year ago

Another variant of the above idea would be RUSTFLAGS_BINARY which would be RUSTFLAGS that apply to binary crates only. This would cover cases like PGO where you're just altering the final link.

cargo rustc can already achieve that, no?

Edit: perhaps this as well https://github.com/rust-lang/rfcs/pull/3310? But if you only concern binaries not final artifacts, my point is not relevant.

RobJellinghaus commented 1 year ago

You're absolutely right, that RFC is very much related to this idea. I'm not sure which one would work better for more scenarios, but I'll consider it.

cargo rustc can already achieve that, no?

It seems that it can, but I didn't know about it and it seems little used in the community, or else why would the "PGO options in RUSTFLAGS" be such a problem? Is there some reason why cargo rustc hasn't already been used for PGO options more commonly?

Your point's well taken though, and I wonder whether cargo rustc is actually a better substitute for the "RUSTFLAGS for the root crate" RFC.