Open ehuss opened 4 years ago
Provide first-class support for some of the things that people use RUSTFLAGS for today.
My main pain point with RUSTFLAGS is https://github.com/xd009642/tarpaulin/issues/416#issuecomment-623150687, which has some niche flags and wouldn't be helped much by this. AFAIK all of those flags are codegen/linking related though, and don't affect .rlib
s.
Add a mechanism for specifying RUSTFLAGS so that the user can decide whether or not it gets hashed.
Maybe there can be a way for rustc
itself to tell cargo whether an option requires a re-compile? Something like rustc -Z needs-recompile "$RUSTFLAGS"
. I agree this shouldn't be left up to the user, my main complaint about make other build systems is they require you to specify build dependencies yourself.
I don't know enough about -C metadata
and -C extra-filename
to have an opinion.
However, this could still cause problems with reproducible builds
I believe this does because the filenames of object files show up as the codegen unit names in dwarf debug info (and probably PDB?). I suspect that everything else in the binary will be the same, but builds differing only in extra-filename
are likely to still produce different binaries.
Add a mechanism for specifying RUSTFLAGS so that the user can decide whether or not it gets hashed
I think this will work but I don't think it'll be too feasible if the default is to not hash RUSTFLAGS. The pain of rebuilding too much you only discover after-the-fact that you should have been hashing rustflags all along. I don't think we can hash-by-default though since it breaks the use cases you've mentioned.
Provide first-class support for some of the things that people use RUSTFLAGS for today
I do think a --reproducible
flag or --pgo
is something we should add to Cargo no matter what, regardless of how this RUSTFLAGS
issue is settled.
Another possible solution could be adding a new variable like RUSTFLAGS_HASHED
? That way if tooling sets it the weird env name is fine (maybe the case for tarpaulin?)
Maybe there can be a way for rustc itself to tell cargo whether an option requires a re-compile? Something like rustc -Z needs-recompile "$RUSTFLAGS".
Another option is for cargo to inspect RUSTFLAGS and avoid rebuilding for flags it knows can only affect leaf crates or diagnostics, like -C link-args or -W lint flags. I know cargo has a policy of not inspecting RUSTFLAGS but I don't know the history or rationale for that decision.
A note from a user: this issue also appears when one wants to build multiple target flavors of WASM, for example to have two versions, one basic and one with -C target-feature=+simd128
to be imported on runtime depending on the browser support.
Maybe there can be a way for rustc itself to tell cargo whether an option requires a re-compile? Something like rustc -Z needs-recompile "$RUSTFLAGS".
Another option is for cargo to inspect RUSTFLAGS and avoid rebuilding for flags it knows can only affect leaf crates or diagnostics, like -C link-args or -W lint flags. I know cargo has a policy of not inspecting RUSTFLAGS but I don't know the history or rationale for that decision.
I agree that something like this should be done. As https://github.com/rust-lang/rust/issues/110654 points out, current behaviour can be annoying.
First of all, one of the main focuses on the PR — changes in rustflags should not invalidate the cache. Bullet point 1 seems like the only way to solve it.
Go back to hashing RUSTFLAGS, but only in the -C extra-filename, and not -C metadata. This should (probably) avoid the PGO problems, since the symbols won't change. However, this could still cause problems with reproducible builds if the rlib filename somehow leaks into the resulting binary (I don't have a reason to think it does, but it seems risky).
As pointed out:
However, this could still cause problems with reproducible builds
I believe this does because the filenames of object files show up as the codegen unit names in dwarf debug info (and probably PDB?). I suspect that everything else in the binary will be the same, but builds differing only in
extra-filename
are likely to still produce different binaries.
There seems to be no road forward without refactoring something or adding something new - but what? For over 99% of users (it seems like just those who need reproducible builds AND care about the filenames of the units) the solution is one small tweak away. Can we expect the user to know if they care about filenames? Maybe this flag can propagate from the dependencies to the leaves?
What if units were simlinked (or hardlinked) to something not including the extra-filename? The links can be overwriten at neglible cost, while debuggers get a constant filename to work with. Maybe I'm missing the point of extra-filename
.
Secondly, not all RUSTFLAGS values need to be fingerprinted.
Another option is for cargo to inspect RUSTFLAGS and avoid rebuilding for flags it knows can only affect leaf crates or diagnostics, like -C link-args or -W lint flags.
The great thing about this is that building a hashing blacklist is incremental and non-breaking. Even if some values are forgotten, it would still be better than now. It does not solve the problem for when something rebuilds, but it does make rebuilds happen less often. I see no downsides.
I know cargo has a policy of not inspecting RUSTFLAGS but I don't know the history or rationale for that decision.
CLI parsing is complex and having cargo parse another program's CLI means cargo needs to duplicate the knowledge of that other programs CLI. Trying to parse a subset of known flags has its own caveats that for when it falls down flat. Compared to rustup, we at least have the advantage that these CLIs ship together.
Another angle to this is that for #5931, I'd like us to avoid dealing with UX problems of the cache being poisoned, so I'd favor accidentally including too much in the fingerprint than not enough.
I know cargo has a policy of not inspecting RUSTFLAGS but I don't know the history or rationale for that decision.
CLI parsing is complex and having cargo parse another program's CLI means cargo needs to duplicate the knowledge of that other programs CLI. Trying to parse a subset of known flags has its own caveats that for when it falls down flat. Compared to rustup, we at least have the advantage that these CLIs ship together.
Does this mean that currently changing the order of rustflags, or changing -C
to --codegen
, etc., will change the fingerprint?
Does this mean that currently changing the order of rustflags, or changing -C to --codegen, etc., will change the fingerprint?
From my reading of the code, yes, superficial changes like that will cause rebuilds.
One of the potential solutions in the original issue was to provide first-class support for some of the features. #12115 is an example of doing that and came from the pain I had from trying to use RUSTFLAGS for lint levels. Granted, that protects against order changes and -A
vs --allow
changes but we still fingerprint lints. For some, they would rather not. For others, they would expect that adding a new lint would fail the build if tit should trigger so we need to force a rebuild. One nice aspect is that it isn't recursive so for #5931 it won't cause extra dependencies to rebuild if you change your lint levels.
Given that the current semantics of RUSTFLAGS
is "this affects every crate you ever build anywhere" down to the leafmost level, it's hard to see how to change that without breaking everyone.
But my question is, isn't the current behavior the biggest possible hammer? Could a solution involve providing a limited-scope version of RUSTFLAGS
? For example, if you wanted your RUSTFLAGS
setting to affect only your mybinary
crate, perhaps something like RUSTFLAGS_CRATE_MYBINARY
to set flags for that crate only? All other crates would ignore that environment variable.
I don't know whether this scales to many of the RUSTFLAGS
scenarios, and I'm handwaving about things like multiple registries, but certainly for anything affecting rootmost binary linking, it seems viable?
Edit: Thinking about it more, this is architecturally orthogonal to the issue of not wanting some particular flags to affect the symbol hashes of the crate, because that's an issue for the root binary. So this idea maybe helps with reducing the "blast zone" of RUSTFLAGS
, but not the hash stability issue.
Another variant of the above idea would be RUSTFLAGS_BINARY
which would be RUSTFLAGS
that apply to binary crates only. This would cover cases like PGO where you're just altering the final link.
We could potentially even say that ordinary RUSTFLAGS
are included in the metadata/hash again, but that RUSTFLAGS_BINARY
are not, and are treated like PGO flags today. This would let us get back to side-by-side cacheability for the relatively few cases where you really need RUSTFLAGS
that apply to every crate you're building, while preserving the current property that PGO flags don't affect the metadata/hash.
I know it's another knob and we hate those, but really we need something less pervasive than RUSTFLAGS
don't we?
Another variant of the above idea would be
RUSTFLAGS_BINARY
which would beRUSTFLAGS
that apply to binary crates only. This would cover cases like PGO where you're just altering the final link.
cargo rustc
can already achieve that, no?
Edit: perhaps this as well https://github.com/rust-lang/rfcs/pull/3310? But if you only concern binaries not final artifacts, my point is not relevant.
You're absolutely right, that RFC is very much related to this idea. I'm not sure which one would work better for more scenarios, but I'll consider it.
cargo rustc
can already achieve that, no?
It seems that it can, but I didn't know about it and it seems little used in the community, or else why would the "PGO options in RUSTFLAGS" be such a problem? Is there some reason why cargo rustc
hasn't already been used for PGO options more commonly?
Your point's well taken though, and I wonder whether cargo rustc
is actually a better substitute for the "RUSTFLAGS for the root crate" RFC.
When RUSTFLAGS is changed between invocations, this can trigger a complete rebuild. This can be annoying, and the only workaround is to use a separate target directory for each RUSTFLAGS setting. I think we should revisit how this works, and see if there is a way to address the problems.
Motivation:
History:
6503 added RUSTFLAGS to the metadata hash, allowing independent caching. 1.33 (2019-02-28)
6966 Removed
--remap-path-prefix
from the hash, because it broke reproducible builds. 1.37 (2019-08-15)7417 Reverted #6503 because it broke profile guided optimization. 1.39 (2019-11-07)
Solutions? I am uncertain what a good approach would be. Some rough ideas:
-C extra-filename
, and not-C metadata
. This should (probably) avoid the PGO problems, since the symbols won't change. However, this could still cause problems with reproducible builds if the rlib filename somehow leaks into the resulting binary (I don't have a reason to think it does, but it seems risky).