Reproducible build tooling

tarcieri commented 5 years ago

Reproducible builds would be useful for a number of different reasons:

Binary releases of Rust applications which can be independently verified for reproducibility
Binary crates
A community build system / global build cache

I believe the main issue for reproducible Rust builds in general is:

https://github.com/rust-lang/rust/issues/34902

Right now it seems like some people are able to successfully use the following tool to test for reproducibility of Rust builds in CI:

https://pypi.org/project/reprotest/

I'm curious if it would make sense to build a Rust-specific tool for this purpose, particularly one that integrated with cargo workflows and can both drive reproducible builds and check them, either in CI or as part of an auditing service. Something like cargo repro, maybe with a cargo repro build and cargo repro check?

kpcyrd commented 5 years ago

reprotest is language agnostic, it basically takes some files as input, runs a shell command twice with as many variations as possible and then compares the result.

Right now it seems binary releases are usually reproducible given this command:

RUSTFLAGS="--remap-path-prefix=$HOME=/remap-home --remap-path-prefix=$PWD=/remap-pwd" \
    cargo build --release --verbose

I'm not sure it makes sense to duplicate reprotest into cargo, but it would make sense to remap paths automatically: https://github.com/rust-lang/cargo/issues/5505

After that issue is resolved it should be possible to verify a binary by just cargo build --release-ing the protect again and comparing checksums, given a sufficiently similar system (eg rustc version, cargo version). These infos are usually tracked in buildinfo files by linux distros for their reproducible builds setups. We could make cargo generate buildinfo files that are understood by rustup to generate a suitable environment for verification, but I would consider that a very very longterm goal (even though that would be very cool).

Besides rustc, the remaining issues I ran into are usually macros that aren't fully deterministic (because, for example, they list a directory) or build scripts that invoke a c compiler, one of the issues from the top of my head is in ring: https://github.com/briansmith/ring/issues/715

tarcieri commented 5 years ago

I'm not sure it makes sense to duplicate reprotest into cargo

My use case is: I don't presently have a Python interpreter installed in one particular environment where I would like to do build verifications (which, as it were, is the one that matters most to me), and I would personally prefer to keep it that way. Adding another language interpreter to this environment would create a lot of additional attack surface I would prefer not to have.

More specifically: I would like to use the successful verification of a reproducible build to authorize the usage of a particular cryptographic signing key for signing a release containing the reproduced binaries.

dongcarl commented 5 years ago

@tarcieri Perhaps this is something that GUIX challenge can help with?

snf commented 5 years ago

A new tool would also open the doors to other platforms whereas reprotest is Linux oriented.

Even if we contribute support for other platforms to it, non-nix platforms like Windows (Rust Tier 1) will involve a non-trivial amount of work which might be better to put into cargo repro.

kpcyrd commented 5 years ago

More specifically: I would like to use the successful verification of a reproducible build to authorize the usage of a particular cryptographic signing key for signing a release containing the reproduced binaries.

Just to clarify, you want to ensure that the build hasn't been tampered with before signing it with a special release key, correct? I would not recommend reprotest to do that.

reprotest is more of a ci/debug tool that you would use as a part of your automatic tests that tries to maximize diversity of the build environments and checks that the build result is still identical, or try to narrow down what causes certain differences. This involves some rather low-level tricks that you probably don't want to do on your production build, like LD_PRELOAD hooks and custom fuse filesystems. If your build is reproducible in that kind of setup, you are usually fairly confident that you can successfully rebuild though.

Instead, if I get your usecase correctly, you could run your release build on eg. 3 different systems, each of them signing their build artifact with their own key and then pull each artifact along with its signature to a special signing system. This system would verify every artifact is identical, correctly signed and only then sign with its special master key.

The setup distros like debian are currently planning to run doesn't involve reprotest either, a debian developer/maintainer would upload a binary package along with a buildinfo file that describes the build environment and rebuilders would take both and try to verify it in a build environment as close to that system as reasonably possible. If the artifact is identical, the rebuilder would publish a new buildinfo file with its own signature.

A debian user would process these by configuring that each package must have confirmations by at least X rebuilders they configured as trusted.

You may also want to look into https://in-toto.github.io/ for setups like this.

tarcieri commented 5 years ago

I know some people working on in-toto. Will chat with them about this.

infinity0 commented 5 years ago

If you just run cargo build twice naively, or write a wrapper tool to do that, then some parts of the build environment stays the same, for example build path, local hostname, year, username, etc.

by contrast, reprotest tries very hard to alter the build environment as much as possible, and also calls diffoscope (if it's installed) to display any diffs in the binaries in detail.

infinity0 commented 5 years ago

BTW I was the main developer on reprotest so ask if you have any questions.

tarcieri commented 5 years ago

In an attempt to move this forward, I've created a cargo-repro GitHub project and associated crate with some initial boilerplate:

I've also opened an initial issue to discuss the project's goals and initial design:

https://github.com/rust-secure-code/cargo-repro/issues/3

I am going to go ahead and close this issue and would suggest that anyone interested in this particular topic head over to that GH issue / repo.

Shnatsel commented 4 years ago

I'm reopening the issue to indicate that this is something we're interested in, and also to better track it. For finer-grained updates see https://github.com/rust-secure-code/cargo-repro/

tarcieri commented 4 years ago

rebuilderd is another interesting tool in this space:

https://github.com/kpcyrd/rebuilderd

kpcyrd commented 4 years ago

We found a few rust packages in Arch Linux that are not reproducible for one reason or another (including cargo-audit and rebuilderd itself), do you have any thoughts on collecting the issues in the org somewhere to allow collaboration on tracking them down across the rust ecosystem (rustc and crates)?

rust-secure-code / wg

Reproducible build tooling #28