rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.63k stars 12.74k forks source link

reproducible builds broken in rustc 1.56.0 due to LLVM 13 update #90301

Closed Fuuzetsu closed 3 years ago

Fuuzetsu commented 3 years ago

I attach a tarball with a small reproducer (few dependencies from crates.io, lock file, empty lib.rs).

repro.tar.gz

Now run this using rustc 1.56.0:

for i in $(seq 1 10); do cargo clean &&  cargo build --release --quiet && sha256sum target/release/deps/libpalette_derive-*.so; done

I expect the sha256sum to be the same for the libpalette so file but it's often different.

This breaks any binary caches, similarly to what #89904 did but that ticket turned out to be a problem with the crate.

This time it seems problem with rustc itself (or rather LLVM: stay tuned): 1.55.0 produces same binary/crate hash on every build.

Note that in Cargo.toml there's:

[profile.release]
debug = true

Without this, the issue does not present itself. This perhaps makes it look similar to #89911 or #45397 but I think it's different. As far as I saw, it didn't produce a difference in ordering of symbols in the resulting .so though I may be mistaken. Please feel free to close as duplicate if you deem it to be the same issue.

I have ran with RUSTC_LOG=debug and after careful sifting through few GiB of output, I found the differences seem to start in rustc_codegen_ssa.

I have bisected rustc itself and ran it on our original code which exhibited the problem. I started the bisect at common merge point with 1.55.0 though the problem turned out well into the 1.56.0 release:

$ git bisect good
db002a06ae9154a35d410550bc5132df883d7baa is the first bad commit
commit db002a06ae9154a35d410550bc5132df883d7baa
Merge: e7f7fe462a5 306259c6459
Author: bors <bors@rust-lang.org>
Date:   Sat Aug 21 09:25:28 2021 +0000

    Auto merge of #87570 - nikic:llvm-13, r=nagisa

    Upgrade to LLVM 13

    Work in progress update to LLVM 13. Main changes:

     * InlineAsm diagnostics reported using SrcMgr diagnostic kind are now handled. Previously these used a separate diag handler.
     * Codegen tests are updated for additional attributes.
     * Some data layouts have changed.
     * Switch `#[used]` attribute from `llvm.used` to `llvm.compiler.used` to avoid SHF_GNU_RETAIN flag introduced in https://reviews.llvm.org/D97448, which appears to trigger a bug in older versions of gold.
     * Set `LLVM_INCLUDE_TESTS=OFF` to avoid Python 3.6 requirement.

    Upstream issues:

     * ~~https://bugs.llvm.org/show_bug.cgi?id=51210 (InlineAsm diagnostic reporting for module asm)~~ Fixed by https://github.com/llvm/llvm-project/commit/1558bb80c01b695ce12642527cbfccf16cf54ece.
     * ~~https://bugs.llvm.org/show_bug.cgi?id=51476 (Miscompile on AArch64 due to incorrect comparison elimination)~~ Fixed by https://github.com/llvm/llvm-project/commit/81b106584f2baf33e09be2362c35c1bf2f6bfe94.
     * https://bugs.llvm.org/show_bug.cgi?id=51207 (Can't set custom section flags anymore). Problematic change reverted in our fork, https://reviews.llvm.org/D107216 posted for upstream revert.
     * https://bugs.llvm.org/show_bug.cgi?id=51211 (Regression in codegen for #83623). This is an optimization regression that we may likely have to eat for this release. The fix for #83623 was based on an incorrect premise, and this needs to be properly addressed in the MergeICmps pass.

    The [compile-time impact](https://perf.rust-lang.org/compare.html?start=ef9549b6c0efb7525c9b012148689c8d070f9bc0&end=0983094463497eec22d550dad25576a894687002) is mixed, but quite positive as LLVM upgrades go.

    The LLVM 13 final release is scheduled for Sep 21st. The current nightly is scheduled for stable release on Oct 21st.

    r? `@ghost`

 .gitmodules                                        |  2 +-
 compiler/rustc_codegen_llvm/src/back/write.rs      | 43 +---------
 compiler/rustc_codegen_llvm/src/base.rs            |  8 +-
 compiler/rustc_codegen_llvm/src/consts.rs          | 15 +++-
 compiler/rustc_codegen_llvm/src/context.rs         | 55 +++++++++----
 compiler/rustc_codegen_llvm/src/lib.rs             |  2 +-
 compiler/rustc_codegen_llvm/src/llvm/diagnostic.rs | 94 +++++++++++++++++-----
 compiler/rustc_codegen_llvm/src/llvm/ffi.rs        |  7 +-
 compiler/rustc_codegen_ssa/src/traits/misc.rs      |  2 +
 compiler/rustc_codegen_ssa/src/traits/statics.rs   | 18 ++---
 compiler/rustc_feature/src/accepted.rs             |  2 +-
 compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp   | 20 ++++-
 .../src/spec/powerpc64_unknown_linux_gnu.rs        |  2 +-
 .../src/spec/powerpc64_unknown_linux_musl.rs       |  2 +-
 .../rustc_target/src/spec/powerpc64_wrs_vxworks.rs |  2 +-
 .../src/spec/powerpc64le_unknown_linux_gnu.rs      |  2 +-
 .../src/spec/powerpc64le_unknown_linux_musl.rs     |  2 +-
 .../src/spec/wasm32_unknown_emscripten.rs          |  2 +-
 .../src/spec/wasm32_unknown_unknown.rs             |  2 +-
 compiler/rustc_target/src/spec/wasm32_wasi.rs      |  2 +-
 .../src/spec/wasm64_unknown_unknown.rs             |  2 +-
 src/bootstrap/native.rs                            |  1 +
 src/llvm-project                                   |  2 +-
 src/test/codegen/array-equality.rs                 |  4 +-
 src/test/codegen/issue-83623-SIMD-PartialEq.rs     | 46 -----------
 src/test/codegen/repeat-trusted-len.rs             |  2 +-
 .../run-make-fulldeps/coverage-llvmir/Makefile     |  2 -
 .../coverage-llvmir/filecheck.testprog.txt         | 18 ++---
 src/test/ui/llvm-asm/issue-69092.rs                |  4 +-
 src/test/ui/llvm-asm/issue-69092.stderr            |  4 +-
 30 files changed, 201 insertions(+), 168 deletions(-)
 delete mode 100644 src/test/codegen/issue-83623-SIMD-PartialEq.rs

cc @nikic who did the update

Meta

rustc --version --verbose:

$ rustc --version --verbose
rustc 1.56.0 (09c42c458 2021-10-18)
binary: rustc
commit-hash: 09c42c45858d5f3aedfa670698275303a3d19afa
commit-date: 2021-10-18
host: x86_64-unknown-linux-gnu
release: 1.56.0
LLVM version: 13.0.0
yanok commented 3 years ago

One thing I found: if I run it with RUSTFLAGS='--emit=llvm-ir' I can't reproduce it anymore.

@rustbot claim

yanok commented 3 years ago

Well, I was wrong. Still reproducible even with RUSTFLAGS='--emit=llvm-ir'. Produced *.ll files are identical though.

fangism commented 3 years ago

Well, I was wrong. Still reproducible even with RUSTFLAGS='--emit=llvm-ir'. Produced *.ll files are identical though.

Consistent with my findings with --emit=llvm-ir as well.

yanok commented 3 years ago

Update: this IS a LLVM bug, I'm able to reproduce it with just ll input file using mainline LLVM. Filed https://bugs.llvm.org/show_bug.cgi?id=52441 to LLVM.

yanok commented 3 years ago

I have a fix: https://reviews.llvm.org/D113468

Do we want to apply it to llvm-rust or are we going to wait for LLVM review?

Fuuzetsu commented 3 years ago

@yanok Good job, thank you!

glandium commented 3 years ago

I wonder if rust needs https://reviews.llvm.org/D108968 too.

glandium commented 3 years ago

I wonder if rust needs https://reviews.llvm.org/D108968 too.

Answering my own question: it does.

glandium commented 3 years ago

@yanok Could you also do a PR for https://reviews.llvm.org/D108968?

yanok commented 3 years ago

Sure, will do.

yanok commented 3 years ago

Created #90978

yanok commented 3 years ago

Actually I was too slow, there is already #90954

glandium commented 2 years ago

FWIW, while taking the 1.56.0 source and applying the 2 LLVM reproducibility patches fixes the issues we have with Firefox, the latest nightly doesn't: there's a remaining reproducibility issue on i686 linux. I'll test a patched beta next.

cuviper commented 2 years ago

The next beta release should be fixed, per #90938. (Should go out near 00:00 UTC -- I guess it will be 1.57.0-beta.4.)

glandium commented 2 years ago

the beta branch + the patches seems to fix it, but I've found something new: depending how llvm is built, the result can vary. Specifically, llvm built with a sysroot exhibits reproducibility issues that are not present when not building with a sysroot. I'll wait for the official build for 1.57.0-beta.4 for a definite answer whether the issue is fixed. One thing is sure, though, there's an additional regression in nightly.

glandium commented 2 years ago

So, as far as Firefox is concerned, everything is fixed in the latest betas. I did find a source of non-determinism that is not fixed, but a) it seems to also affects clang (as in, I also have found one in clang, and I think it's the same root cause) b) it goes away with PGO, so it doesn't actually affect Firefox.

Fuuzetsu commented 2 years ago

@glandium is there a ticket about this one somewhere? It's great that it doesn't impact Firefox but it may be impacting other rust software

glandium commented 2 years ago

I haven't filed it because the smallest reproducer I have at the moment is "build firefox with these flags and observe how sometimes some functions have an extra mov", and from experience, those don't lead to any action.

Fuuzetsu commented 2 years ago

I haven't filed it because the smallest reproducer I have at the moment is "build firefox with these flags and observe how sometimes some functions have an extra mov", and from experience, those don't lead to any action.

I see, thanks. It sounds like binary-reproducability thing and ABI should(?) be the same at least.

And this is a regression in new LLVM version (rustc 1.56 and beyond)?

glandium commented 2 years ago

I don't know if this affects rustc, but I narrowed down yet another source of non-determinism in debuginfo, and upstream came up with a patch: https://reviews.llvm.org/D115054

Fuuzetsu commented 2 years ago

We tried using 1.57.0 and it seems broken still. I'll try to get a repro tomorrow. Should I open a new ticket or will someone reopen this one?

Fuuzetsu commented 2 years ago

We tried using 1.57.0 and it seems broken still. I'll try to get a repro tomorrow. Should I open a new ticket or will someone reopen this one?

Actually, seems it comes from a derive crate in our dependency tree, similar to what we saw in https://github.com/rust-lang/rust/issues/89904 ; So maybe it's fine.

EDIT: https://github.com/JelteF/derive_more/pull/178