Closed kanavin closed 3 months ago
Made a slight tweak to the title as this does not prevent any build reproducibility but more specifically reproducibility between host platforms.
As there is still reproducibility within a host platform and this has been this way for a while, this falls below other items in my priority list. If someone would want to drive moving this forward, it would be good to research why we use the verbose version in the hash.
Possibly relevant discussions:
This is the original commit: https://github.com/rust-lang/cargo/commit/fee7c68e59b452cd637ba91532d6b77baa95ef10
As far as I understand there was never any explicit decision to use the verbose version vs concise version, it was considered simply 'the version' at the time of the commit. Maybe it had gained all that additional metadata including the host later?
I still think that if --target is explicitly passed in, then the 'host:' in that output is safe to be ignored.
This seems like a duplicate of #8140? Is there something different here, or can it be closed?
I think the core issue is the same, although the title of #8140 makes it look like ' -C metadata=hash' is the non-reproducible bit, and discussion went on all sorts of tangents around that. The issue is specifically the host in verbose_version when cross-compiling, not the computation of the hash as a whole.
So I would not want to close this, as the issue of whether to pass ' -C metadata=hash' at all is actually different than the issue of putting the 'host:' into it.
Hm, I'm not quite following how it is different. We can reword the title if that helps. That issue is specifically about having host:
in the hash, which affects reproducibility.
I don't think there was any proposal to change whether or not to pass -C metadata
. There is a discussion about how to remove host:
from that hash, and still keep everything working. That is the tricky part.
If you reword the title of #8140 so it's explicitly about host: in the hash, then I'm fine with closing this.
This is fixed via https://github.com/rust-lang/cargo/pull/14107. Closing.
Problem
hash_rustc_version() writes rustc().verbose_version into the hash. That data has one problematic field:
host: x86_64-unknown-linux-gnu or host: aarch64-unknown-linux-gnu
Due to this, when one is using a mix of x86 and aarch build hosts with exactly same rust compiler and building for the same cross-target, the output becomes non-reproducible, and differs between the two; even the file names become different.
Steps
This requries two build hosts that have a different architecture, e.g. x86 and aarch, and running an identical cross-compile on them, using exact same host rust compiler. The output is going to be different (even in the filenames if those include hashes), even though it should be the same.
Possible Solution(s)
The situation occurred in the context of Yocto project builds, where we have a cluster of build machines (a mix of x86 and arm64), the rust version in use is tightly controlled, and we expect the output to be the same (and check for it). The quick and hacky solution was to patch out the problematic lines in cargo, as shown in the attached patch, but perhaps a better option would be to not write the host architecture into the hash when the build is a cross one.
Notes
No response
Version
See this piece of code:
https://github.com/rust-lang/cargo/blob/fc13634f78023381b55452fa7f4d7a974449a5e8/src/cargo/core/compiler/build_runner/compilation_files.rs#L664