Closed infinity0 closed 4 years ago
In general, being reproducible is something we're interested in; we try to tackle it bug by bug.
This probably has some obvious cause:
│ │ │ │ -<h4 id='method.unzip' class='method'><code>fn <a href='../../core/iter/trait.Iterator.html#method.unzip' class='fnname'>unzip</a><A, B, FromA, FromB>(self) -> (FromA, FromB) <span class='where'>where FromB: <a class='trait' href='../../core/default/trait.Default.html' title='core::default::Default'>Default</a> + <a class='trait' href='../../core/iter/trait.Extend.html' title='core::iter::Extend'>Extend</a><B>, FromA: <a class='trait' href='../../core/default/trait.Default.html' title='core::default::Default'>Default</a> + <a class='trait' href='../../core/iter/trait.Extend.html' title='core::iter::Extend'>Extend</a><A>, Self: <a class='trait' href='../../core/iter/trait.Iterator.html' title='core::iter::Iterator'>Iterator</a><Item=(A, B)></span></code></h4>
│ │ │ │ +<h4 id='method.unzip' class='method'><code>fn <a href='../../core/iter/trait.Iterator.html#method.unzip' class='fnname'>unzip</a><A, B, FromA, FromB>(self) -> (FromA, FromB) <span class='where'>where FromB: <a class='trait' href='../../core/default/trait.Default.html' title='core::default::Default'>Default</a> + <a class='trait' href='../../core/iter/trait.Extend.html' title='core::iter::Extend'>Extend</a><B>, Self: <a class='trait' href='../../core/iter/trait.Iterator.html' title='core::iter::Iterator'>Iterator</a><Item=(A, B)>, FromA: <a class='trait' href='../../core/default/trait.Default.html' title='core::default::Default'>Default</a> + <a class='trait' href='../../core/iter/trait.Extend.html' title='core::iter::Extend'>Extend</a><A></span></code></h4>
If someone wants to look at the code generation differences, it's probably best to start with libcore. The reproducible-builds.org diff isn't that useful for that because it doesn't recognize .rlib files as archives.
Thanks for the heads up, I just added AR support in diffoscope and hopefully the diff output will see that in the next few weeks, when the website updates.
Here's a diff of rust.metadata.bin from libcore.rlib, from 1.10.0. (2.0 MB, ~1700 lines, might make your browser slow.) Mostly ordering differences. @eddyb has suggested HashMap ordering as a culprit, and that would be replaced with FnvHashMap instead.
If it's a HashMap
, I don't know which one. cc @rust-lang/compiler
I tried replacing HashMap with FnvHashMap in src/librustc_metadata/loader.rs
but it didn't seem to help. Here is the Makefile i'm using to automate my rebuilds:
TRIPLET = x86_64-unknown-linux-gnu
all: libcore.diff
libcore.diff: rust.metadata.bin.1 rust.metadata.bin.2
diffoscope --html-dir "$@" rust.metadata.bin.1 rust.metadata.bin.2 \
--max-diff-block-lines 5000 \
--max-diff-input-lines 10000000 \
--max-report-size 204800000; true # diffoscope returns 1 for "there were diffs"
rust.metadata.bin.1 rust.metadata.bin.2: Makefile
rm -f $(TRIPLET)/stage2/lib/rustlib/$(TRIPLET)/lib/libcore-*.rlib
rm -f $(TRIPLET)/stage2/lib/rustlib/$(TRIPLET)/lib/stamp.core
$(MAKE) $(TRIPLET)/stage2/lib/rustlib/$(TRIPLET)/lib/stamp.core
ar x $(TRIPLET)/stage2/lib/rustlib/$(TRIPLET)/lib/libcore-*.rlib rust.metadata.bin
mv rust.metadata.bin "$@"
.PHONY: all rust.metadata.bin.1 rust.metadata.bin.2
perhaps i'm Doing It Wrong.
As background, HashMap in rust is non-deterministic to protect against certain types of DoS attack. You can switch it to the deterministic FnvHashMap if you're sure your code will always be called in a safe manner. This ought to be true for rustc itself. (I notice some online "try-it-yourself" rust web services let me run "ls /" and other shell commands, so I could also exploit this HashMap ordering issue, but I also assume that they're clever enough to set a ulimit and/or containerise the thing.)
I wrote a small script to diff metadata (I couldn't really get the makefile to work).
It looks like even more is changing between two compiles using the current master (or nightly): The crate hash and/or disambiguator, which are stored right after the target triple. @infinity0 can you confirm?
EDIT: That might get fixed by #35854
Okay, apparently replacing Hash{Map,Set}
in resolve with FnvHash{Map,Set}
got rid of the first (smaller) blocks of the diff. The big block at the bottom is still there. Will continue to randomly sed
the compiler and report back ;)
The remaining issue is that (some?) predicates are encoded in non-deterministic order. Maybe we need to do this for all bounds? Not sure what would be the correct way to do that (or why these are nondeterministic in the first place).
Just for reference, there's this: https://github.com/rust-lang/rust/pull/34805
@michaelwoerister Thanks! Too tired to think about it currently. Will have a look tomorrow :)
Potentially relevant, from IRC:
<eddyb> nmatsakis, infinity0: fwiw, I just found another hash table. not random, but may cause issues around incremental recompilation. see rustc_metadata::encoder::encode_reachable (the reachable NodeSet)
though I think that it's not a big issue unless the hashtable is truly random.
Fix suggestion is changing that to be a BTreeSet<DefIndex>
. Other things could also use btrees.
In case this helps to motivate anyone, we got a successful reproduction on i386 on Debian testing!
https://tests.reproducible-builds.org/debian/rb-pkg/testing/i386/rustc.html
This may or may not be "an accident", we'll have to see what future tests show. Also on this arch/platform we are fixing the build path (just to see how it does). For other arch/platforms we vary the build path, and haven't seen rustc reproduce there yet.
At least when building with debuginfo, the build path will show up in the resulting binaries as the DW_AT_comp_dir
attribute of the DW_TAG_compile_unit
.
@michaelwoerister
In gcc (and clang), the -fdebug-prefix-map
argument allows one to avoid this. Is a similar option for rustc available?
@jmesmon No, we don't have an option like that. If you open an issue, I'd be happy to discuss it with the @rust-lang/tools team.
Assuming that #41508 works properly, there are only two issues left, judging from this diff obtained when not varying the build path across builds.
├── ./usr/share/doc/rust-doc/html/implementors/core/cmp/trait.Ord.js
├── js-beautify {}
@@ -1,10 +1,9 @@
(function() {
var implementors = {};
implementors["alloc"] = ["impl<T: ? [.. snip ..] implementors["rustc_unicode"] = []; implementors["collections"] = ["impl<T> [.. snip ..] implementors["core"] = [];
if (window.register_implementors) { window.register_implementors(implementors); } else {
<2ba6> DW_AT_GNU_dwo_name: (indirect string, offset: 0x33a0): x86_64-unknown-linux-gnu/rustllvm/PassWrapper.dwo
<2baa> DW_AT_comp_dir : .
<2bac> DW_AT_GNU_pubnames: 1
<2bac> DW_AT_GNU_addr_base: 0x4d90
The doc issue seems to be gone, the dwo_id issue remains.
@@ -4968,40 +4968,40 @@
<2ef2> DW_AT_ranges : 0x51c0
<2ef6> DW_AT_low_pc : 0x0
<2efe> DW_AT_stmt_list : 0x1110
<2f02> DW_AT_GNU_dwo_name: (indirect string, offset: 0x3221): x86_64-unknown-linux-gnu/rustllvm/RustWrapper.dwo
<2f06> DW_AT_comp_dir : .
<2f08> DW_AT_GNU_pubnames: 1
<2f08> DW_AT_GNU_addr_base: 0x0
- <2f0c> DW_AT_GNU_dwo_id : 0x82dcd4a4a86511a
+ <2f0c> DW_AT_GNU_dwo_id : 0x94cc2c8c89838e78
<2f14> DW_AT_GNU_ranges_base: 0x1290
Compilation Unit @ offset 0x2f18:
Length: 0x2e (32-bit)
Version: 4
Abbrev Offset: 0x206
Pointer Size: 8
<0><2f23>: Abbrev Number: 1 (DW_TAG_compile_unit)
<2f24> DW_AT_ranges : 0x8250
<2f28> DW_AT_low_pc : 0x0
<2f30> DW_AT_stmt_list : 0x4366
<2f34> DW_AT_GNU_dwo_name: (indirect string, offset: 0x3253): x86_64-unknown-linux-gnu/rustllvm/PassWrapper.dwo
<2f38> DW_AT_comp_dir : .
<2f3a> DW_AT_GNU_pubnames: 1
<2f3a> DW_AT_GNU_addr_base: 0x4f08
- <2f3e> DW_AT_GNU_dwo_id : 0x6564472d68267dc6
+ <2f3e> DW_AT_GNU_dwo_id : 0xa4b7031387acdfba
<2f46> DW_AT_GNU_ranges_base: 0x58a0
Compilation Unit @ offset 0x2f4a:
Length: 0x2e (32-bit)
Version: 4
Abbrev Offset: 0x223
Pointer Size: 8
<0><2f55>: Abbrev Number: 1 (DW_TAG_compile_unit)
<2f56> DW_AT_ranges : 0xaf70
<2f5a> DW_AT_low_pc : 0x0
<2f62> DW_AT_stmt_list : 0x6382
<2f66> DW_AT_GNU_dwo_name: (indirect string, offset: 0x3285): x86_64-unknown-linux-gnu/rustllvm/ArchiveWrapper.dwo
<2f6a> DW_AT_comp_dir : .
<2f6c> DW_AT_GNU_pubnames: 1
<2f6c> DW_AT_GNU_addr_base: 0x7528
- <2f70> DW_AT_GNU_dwo_id : 0xc1d2b489290c9d87
+ <2f70> DW_AT_GNU_dwo_id : 0xcd75544460bc0e9c
<2f78> DW_AT_GNU_ranges_base: 0x8480
This DW_AT_GNU_dwo_id
seems to be introduced by the C++ compiler when compiling rustllvm
. @alexcrichton, can we disable split debuginfo here somehow?
cc @brson
Also FYI on Debian (where these tests are done) rustllvm is dynamically linked against the LLVM shared library, in case this affects the situation.
@michaelwoerister sure yeah we can just pass whatever flags are necessary to the C compiler
It looks like llvm-config -cxxflags
on Debian at least outputs -gsplit-dwarf
so I'm applying this patch for the time being:
--- a/src/librustc_llvm/build.rs
+++ b/src/librustc_llvm/build.rs
@@ -137,6 +137,11 @@
let cxxflags = output(&mut cmd);
let mut cfg = gcc::Config::new();
for flag in cxxflags.split_whitespace() {
+ // Split-dwarf gives unreproducible DW_AT_GNU_dwo_id so don't do it
+ if flag == "-gsplit-dwarf" {
+ continue;
+ }
+
// Ignore flags like `-m64` when we're doing a cross build
if is_crossed && flag.starts_with("-m") {
continue;
Not sure if rustc wants to include it, since technically it is an issue with either llvm-config
or the C++ compiler.
I was hoping to be able to declare rustc the first reproducible compiler, but unfortunately it seems even with the above patch there was a regression between 1.18 and 1.19, see libcore.diff which was produced using the same build-path. Probably another case of non-deterministic hash tables being used; I will see if I can write a test case to avoid this happening again in the future.
For other diffs as well as the actual rustc binaries see the parent directory.
Oh, I just had a thought, I added codegen-units = 0
recently to our config.toml
which enables parallel codegen. This might be interfering with reproducibility; I will try disabling that and re-running the builds.
OK! After disabling parallel codegen we get much much further, but there is still a regression, the build of liballoc_jemalloc is no longer deterministic.
tests.r-b.org has also caught up with us and is agreeing with this result, though a bug in diffoscope that got fixed recently causes it not to display any output at the time of writing (it should be fixed automatically soon).
This is the jemalloc diff between 1.17 and 1.19. Is it worth me trying to chase this down or were you guys planning to switch the default to the system allocator soon? I seem to remember an ongoing discussion about that.
Thanks so much for the ongoing work here @infinity0!
@michaelwoerister do you know if codegen-units > 1
would cause the build to be non-reproducible? I figured that w/ incremental work that wouldn't be the case!
@infinity0 we're unfortunately still aways away from jettisoning jemalloc. We need the #[global_allocator]
attribute to stabilize first before doing so, which would still be at least 12 weeks away (ish) if we stabilized it today. We don't currently have a timeline for the stabilization of that yet, but it should be relatively close in the sense that the design is much closer to "stabilizable" than the previous.
Thanks so much for the ongoing work here @infinity0!
+1, this is awesome :)
@michaelwoerister do you know if codegen-units > 1 would cause the build to be non-reproducible? I figured that w/ incremental work that wouldn't be the case!
I would have thought so too. There are two things that come to mind here:
hidden
and internal
. I only discovered this yesterday.Cargo seems to build things in random order -- that could affect the linker flags, right?
Hm perhaps! I think we did a change to Cargo awhile ago to make any one rustc invocation determinstic, e.g. https://github.com/rust-lang/cargo/pull/3937
With 1.21.0+dfsg1-3 Debian, rustc the compiler is built deterministically (and build-path-independently) but libstd has some things remaining (112MB diff):
./usr/lib/x86_64-linux-gnu/librustc-2d0e4ac759a96c13.so
objdump --line-numbers --disassemble --demangle --section=.text {}
@@ -39978,25 +39978,25 @@
26491: 48 89 e5 mov %rsp,%rbp
/usr/src/rustc-1.21.0/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/src/float/sub.rs:11
26494: f2 0f 5c c1 subsd %xmm1,%xmm0
/usr/src/rustc-1.21.0/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/src/macros.rs:255
26498: 5d pop %rbp
26499: c3 retq
2649a: 66 90 xchg %ax,%ax
-/build/1st/rustc-1.21.0+dfsg1/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/compiler-rt/lib/builtins/x86_64/floatundidf.S:39
+/build/rustc-1.21.0+dfsg1/2nd/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/compiler-rt/lib/builtins/x86_64/floatundidf.S:39
2649c: 66 0f 6e c7 movd %edi,%xmm0
-/build/1st/rustc-1.21.0+dfsg1/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/compiler-rt/lib/builtins/x86_64/floatundidf.S:40
+/build/rustc-1.21.0+dfsg1/2nd/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/compiler-rt/lib/builtins/x86_64/floatundidf.S:40
264a0: 48 c1 ef 20 shr $0x20,%rdi
-/build/1st/rustc-1.21.0+dfsg1/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/compiler-rt/lib/builtins/x86_64/floatundidf.S:41
+/build/rustc-1.21.0+dfsg1/2nd/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/compiler-rt/lib/builtins/x86_64/floatundidf.S:41
264a4: 48 0b 3d 25 19 00 00 or 0x1925(%rip),%rdi
-/build/1st/rustc-1.21.0+dfsg1/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/compiler-rt/lib/builtins/x86_64/floatundidf.S:42
+/build/rustc-1.21.0+dfsg1/2nd/src/rustc/compiler_builtins_shim/../../libcompiler_builtins/compiler-rt/lib/builtins/x86_64/floatundidf.S:42
264ab: 66 0f 56 05 fd 18 00 orpd 0x18fd(%rip),%xmm0
264b2: 00
What's strange is that some of the paths are already remapped to /usr/src
as I intended with the -Zremap-path-prefix
flags I'm passing. However, I'm only passing -Zremap-path-prefix-from=$PWD
, but some other experiments with @kpcyrd on other rust crates suggests that we need to also pass -Zremap-path-prefix-from=$CARGO_HOME
in the general case. I'll try that next time and report back, hopefully everything will be reproducible then.
I should mention that I had to disable jemalloc in Debian for other reasons, so builds with that enabled might still hit the nondeterminism I mentioned earlier.
@infinity0
Nice to hear we almost got rid of nondeterminism! The last bunch of path dependence looks like it comes from compiler-rt asm files that are a part of compiler-builtins - I think the build-script that compiles them all is here.
I think it's just that we don't pass debuginfo remapping all the way down to gcc.
Rust 1.22.1 has a reproducibility problem in liblibc's rust.metadata.bin .The symbols in there seem to be shuffled around (example: getutxent
).
@daym I can't find anything in the metadata encoder that immediately seems at error here. Would be interesting to look into this some more.
@michaelwoerister Thanks for checking!
Further information:
With newer rusts, the situation is much better. We work around the remaining reproducibility problems of rust 1.25, rust 1.26, rust 1.27 by using llvm 3.9.1 instead of llvm 6.0.1, which makes their build reproducible then. Rust 1.28.0 is fine with llvm 6.0.1, although rust 1.29.2's cargo does not build reproducibly (for the latter see #50556, also https://lists.gnu.org/archive/html/guix-patches/2018-10/msg00491.html ).
Our build of rust 1.22.1 above--with the problem observed--used llvm 3.9.1. Interesting is that older versions of rust (1.19.0, 1.20.0, 1.21.0) are apparently reproducible!
So in principle the fix could be bisected.
But we use rust 1.22.1 as part of a bootstrapping effort, bootstrapping the rust compiler from C++ via mrustc.
mrustc so far only supports compiling the rust 1.19.0 compiler, so that's how we ended up (eventually) caring about rust 1.22.1.
That said, maybe just wait until mrustc supports compiling the rust 1.23.0 compiler directly. We'll see which comes first.
What's the status of this? Is it already possible to make a reproducible/deterministic build or not? (I'm interested in both binaries and staticlibs)
My understanding is builds are now (sometimes?) reproducible and that people are using reprotest successfully in CI.
To me it seems like we're not too far away from robust reproducible builds, but the main problem right now besides the lingering small issues mentioned above the lack of easy-to-use tooling for producing and verifying such builds.
I opened an issue about that on the Secure Code WG issue tracker:
Isn't this a significant milestone on reproducibility.
https://www.reddit.com/r/rust/comments/afscgo/ripgrep_0100_is_reproducible_in_debian/
My understanding is builds are now (sometimes?) reproducible and that people are using reprotest successfully in CI.
@tarcieri can you share some links of CI setups for that?
Well, if you're ok with cargo culting a config from a project called sniffglue
, here is an example:
https://github.com/kpcyrd/sniffglue/blob/master/ci/reprotest.sh
Just to prevent some confusion: these two issues are separate: 1) reproducible builds of softwares written in Rust, and 2) reproducible builds of Rust compiler. Since Rust compiler is written in Rust, 2 requires 1.
Since this is an issue on Rust compiler repository, this issue is mostly about 2. 2 is very close but not done. So this issue is open.
1 is largely possible now, although tooling is lacking. I think further discussion of 1 should happen on Secure Code WG.
Thanks for the clarification @sanxiyn !
If anyone is interested in discussing tooling for producing and verifying reproducible builds, particularly for things like CI use cases, I opened an issue on the Secure Code WG repo here:
Continuing off from last time, more than a year ago:
@infinity0
Nice to hear we almost got rid of nondeterminism! The last bunch of path dependence looks like it comes from compiler-rt asm files that are a part of compiler-builtins - I think the build-script that compiles them all is here.
I think it's just that we don't pass debuginfo remapping all the way down to gcc.
The cc crate now honours CFLAGS and CPPFLAGS so this problem has gone away and I can confirm the libcompiler_builtins builds reproducibly independently of build-path including the C code.
However there is a new regression, dylib_metadata is no longer reproducible. (rlib_metadata is fine.) Please see these samples:
These files were extracted from the .rustc
ELF section of the respective .so
files. I don't know how to parse the binary data into a human-readable format, so I can't continue further by myself. Someone needs to explain it a bit more.
I think this is the last piece so if someone can figure this out in the next month (before Mar 12) we could in theory get a reproducible rustc
into the next Debian Stable, if that motivates anyone to work on it.
I don't know how to parse the binary data into a human-readable format, so I can't continue further by myself.
It's not just a binary format, but also compressed on top (e.g. you can't see any strings): https://github.com/rust-lang/rust/blob/f40aaa68278ef0879af5fe7ce077c64c6515ea05/src/librustc_metadata/locator.rs#L876-L889
The metadata header is currently 12 bytes: https://github.com/rust-lang/rust/blob/f40aaa68278ef0879af5fe7ce077c64c6515ea05/src/librustc_metadata/schema.rs#L45-L46
So you can strip those first 12 bytes then inflate the rest.
(rlib_metadata is fine.)
My mistake, rlib_metadata is also affected, here is another sample:
Note that the other contents of the rlib
s and so
s are reproducible including build path remappings and the filename hashes. If you look inside the metadata files I just linked, they both contain remapped paths starting with /usr/src/rustc-1.32.0
. Something else is different.
As a side note to anyone interested in creating reproducible builds of libraries/apps authored in Rust, I've created a cargo-repro
project in the Secure Code WG as a place to focus efforts on creating a Cargo-driven workflow for reproducible builds:
https://github.com/rust-secure-code/cargo-repro
(Apologies if this is off-topic for this particular issue, but I thought there might be some interest overlap in the people subscribed)
It would be good if rustc could generate bit-for-bit reproducible results, even in the presence of minor system environment differences. Currently we have quite a large diff: e.g. see txt diff for 1.9.0 or perhaps txt diff for 1.10.0 a few days after I'm posting this. (You might want to "save link as" instead of displaying it directly in the browser.)
Much of the diff output is due to build-id differences, which can be ignored since they are caused by other deeper issues and will go away once these deeper issues are fixed. One example of a deeper issue is this:
Here are the system variations that might be causing these issues. I myself am not that familiar with ELF, but perhaps someone else here would know why the section header is 4 bytes later in the first vs second builds.