rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.63k stars 2.4k forks source link

Build host architecture is written into metadata hash for compilation units, breaking reproducibility of output between host platforms #13922

Closed kanavin closed 3 months ago

kanavin commented 4 months ago

Problem

hash_rustc_version() writes rustc().verbose_version into the hash. That data has one problematic field:

host: x86_64-unknown-linux-gnu or host: aarch64-unknown-linux-gnu

Due to this, when one is using a mix of x86 and aarch build hosts with exactly same rust compiler and building for the same cross-target, the output becomes non-reproducible, and differs between the two; even the file names become different.

Steps

This requries two build hosts that have a different architecture, e.g. x86 and aarch, and running an identical cross-compile on them, using exact same host rust compiler. The output is going to be different (even in the filenames if those include hashes), even though it should be the same.

Possible Solution(s)

The situation occurred in the context of Yocto project builds, where we have a cluster of build machines (a mix of x86 and arm64), the rust version in use is tightly controlled, and we expect the output to be the same (and check for it). The quick and hacky solution was to patch out the problematic lines in cargo, as shown in the attached patch, but perhaps a better option would be to not write the host architecture into the hash when the build is a cross one.

From 065d7c263091118437465d714d8a29dbb6296921 Mon Sep 17 00:00:00 2001
From: Alexander Kanavin <alex@linutronix.de>
Date: Mon, 13 May 2024 14:57:54 +0200
Subject: [PATCH] cargo: do not write host information into compilation unit
 hashes

This breaks reproducibility in cross-builds where the cross-target
can be the same, but build hosts are different, as seen with
"rustc --version -v":
...
host: x86_64-unknown-linux-gnu

vs.

host: aarch64-unknown-linux-gnu

This can possibly be improved by only hashing host info if the build
is a native one (e.g. there's no --target option passed to cargo
invocation) but I'm not sure how.

Signed-off-by: Alexander Kanavin <alex@linutronix.de>
---
 .../src/cargo/core/compiler/context/compilation_files.rs      | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/tools/cargo/src/cargo/core/compiler/context/compilation_files.rs b/src/tools/cargo/src/cargo/core/compiler/context/compilation_files.rs
index d83dbf10c..b2ad8d9f3 100644
--- a/src/tools/cargo/src/cargo/core/compiler/context/compilation_files.rs
+++ b/src/tools/cargo/src/cargo/core/compiler/context/compilation_files.rs
@@ -652,7 +652,7 @@ fn hash_rustc_version(bcx: &BuildContext<'_, '_>, hasher: &mut StableHasher) {
     if vers.pre.is_empty() || bcx.config.cli_unstable().separate_nightlies {
         // For stable, keep the artifacts separate. This helps if someone is
         // testing multiple versions, to avoid recompiles.
-        bcx.rustc().verbose_version.hash(hasher);
+        //bcx.rustc().verbose_version.hash(hasher);
         return;
     }
     // On "nightly"/"beta"/"dev"/etc, keep each "channel" separate. Don't hash
@@ -665,7 +665,7 @@ fn hash_rustc_version(bcx: &BuildContext<'_, '_>, hasher: &mut StableHasher) {
     // Keep "host" since some people switch hosts to implicitly change
     // targets, (like gnu vs musl or gnu vs msvc). In the future, we may want
     // to consider hashing `unit.kind.short_name()` instead.
-    bcx.rustc().host.hash(hasher);
+    //bcx.rustc().host.hash(hasher);
     // None of the other lines are important. Currently they are:
     // binary: rustc  <-- or "rustdoc"
     // commit-hash: 38114ff16e7856f98b2b4be7ab4cd29b38bed59a
-- 
2.39.2

Notes

No response

Version

See this piece of code:

https://github.com/rust-lang/cargo/blob/fc13634f78023381b55452fa7f4d7a974449a5e8/src/cargo/core/compiler/build_runner/compilation_files.rs#L664

epage commented 4 months ago

Made a slight tweak to the title as this does not prevent any build reproducibility but more specifically reproducibility between host platforms.

As there is still reproducibility within a host platform and this has been this way for a while, this falls below other items in my priority list. If someone would want to drive moving this forward, it would be good to research why we use the verbose version in the hash.

weihanglo commented 4 months ago

Possibly relevant discussions:

kanavin commented 4 months ago

This is the original commit: https://github.com/rust-lang/cargo/commit/fee7c68e59b452cd637ba91532d6b77baa95ef10

As far as I understand there was never any explicit decision to use the verbose version vs concise version, it was considered simply 'the version' at the time of the commit. Maybe it had gained all that additional metadata including the host later?

I still think that if --target is explicitly passed in, then the 'host:' in that output is safe to be ignored.

ehuss commented 4 months ago

This seems like a duplicate of #8140? Is there something different here, or can it be closed?

kanavin commented 4 months ago

I think the core issue is the same, although the title of #8140 makes it look like ' -C metadata=hash' is the non-reproducible bit, and discussion went on all sorts of tangents around that. The issue is specifically the host in verbose_version when cross-compiling, not the computation of the hash as a whole.

kanavin commented 4 months ago

So I would not want to close this, as the issue of whether to pass ' -C metadata=hash' at all is actually different than the issue of putting the 'host:' into it.

ehuss commented 4 months ago

Hm, I'm not quite following how it is different. We can reword the title if that helps. That issue is specifically about having host: in the hash, which affects reproducibility.

I don't think there was any proposal to change whether or not to pass -C metadata. There is a discussion about how to remove host: from that hash, and still keep everything working. That is the tricky part.

kanavin commented 4 months ago

If you reword the title of #8140 so it's explicitly about host: in the hash, then I'm fine with closing this.

weihanglo commented 3 months ago

This is fixed via https://github.com/rust-lang/cargo/pull/14107. Closing.