gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
602 stars 200 forks source link

[Question] Reproducible builds #153

Open fnerdman opened 3 years ago

fnerdman commented 3 years ago

One use case for gramine could be an open source application where - by knowing the MRENCLAVE and having all the enclaves source and configuration files - a third party would be able to rebuild the enclave and come to the same MRENCLAVE. Imagine e.g. a blockchain oracle running as an SGX enclave. Such an application could only be trusted by a 3rd party if it could reproducibly build the enclave.

Releasing prebuilt binaries signed by the project authors is already a good step towards that goal. Still it would be interesting to know whether there are plans to make building gramine completely reproducible?

dimakuv commented 3 years ago

@lead4good You are absolutely correct. Ideally, we want a reproducible build of all parts of the SGX enclave, including the Gramine binaries themselves.

@mkow @boryspoplawski @woju have more thoughts on this. Could you remind what is the the current state and whether we have plans to have reproducible builds of Gramine soon-ish?

mkow commented 3 years ago

We plan to have reproducible builds, but the priority of this is uncertain yet. I'd like to have that for the next release, but we'll see. Also, contributions are welcome, if you have time to help :)

woju commented 3 years ago

What is important in reproducible builds is definition of expected build environment (with supported variability) and thorough testing thereof. No-one is working on this at the moment.

The best thing you can do at the moment is to use official packages. They won't change a bit.

fnerdman commented 3 years ago

@dimakuv @mkow @woju thanks for your detailed answers. I am 99% sure that the signed Gramine binaries are sufficient for our use case for now. However this is likely to change in the future, so do you mind keeping this issue open to keep track of reproducible builds for Gramine?

mkow commented 3 years ago

Yup, we'll definitely keep it open as we plan to introduce reproducible builds at some point ;)

fnerdman commented 1 year ago

@dimakuv @mkow @woju As we've discussed in the Gramine Contributor Meeting yesterday, Gramine Reproducible Builds would be important for our use case.

What further steps would need to taken to achieve reproducible builds?

mkow commented 1 year ago

First step would be to build Gramine on a few different machines / OSes (but the same compiler version) and find all moving parts (e.g. timestamps embedded in some binary files). Then we'd need to fix them and also document the exact reproduction steps. We could also use some tool @woju mentioned which works a'la CI and can periodically check reproducibility.

fnerdman commented 1 year ago

@woju might be referring to tools listed here: https://reproducible-builds.org/tools/

lonerapier commented 1 year ago

We were able to build Gramine on two machines with different CPU architecture and found no diffs between the builds. The diffs bash script and Dockerfiles can be found here.

For the next steps, we are planning to diff the Gramine's official released binaries. Can I know the details of the machine used to build the official binaries and the process to build them? Are these built using CI scripts or some other ways?

dimakuv commented 1 year ago

@lonerapier Thanks for these interesting experiments!

Can I know the details of the machine used to build the official binaries and the process to build them?

@woju should be able to provide these details.

Are these built using CI scripts or some other ways?

The source code for the builds is found in our Gramine repo:

We run the CI on every commit/PR in Gramine, automatically building these packages (for testing):

woju commented 1 year ago

bullseye base, with bullseye-backports and this intel repo exactly as in CI.

Here's a .buildinfo for you (sorry for .txt suffix, github doesn't accept arbitrary files): gramine_1.4_amd64.buildinfo.txt

Please share yours, I'd be happy to compare :)

lonerapier commented 1 year ago

Thanks for the help.

We built gramine manually in an amd64 machine with the steps mentioned in the CI build and diffed it using diffoscope against official debian package releases. Have found changes in the official gramine libOS libsysdb.so.

Can't figure out what could have induced the changes in the binary, as all the steps followed were same as mentioned in the scripts except the debuild where I had to use -uc -us flags due to not having build signatures.

All the diffs along with Dockerfile, buildinfo and other files can be found here.

fnerdman commented 1 year ago

@dimakuv @woju We've been able to reproduce the deb builds except of one single lib file, libsysdb.os. We're not sure what is causing the diffs. Any thoughts? You can checkout the diff here.

mkow commented 1 year ago

@lead4good: One difference I can see is that one libsysdb.so contains BuildInfo and the other doesn't:

    +  0x00077d80 61656664 35643038 37356334 66646637 aefd5d0875c4fdf7
    +  0x00077d90 36656464 36376635 30343339 35326264 6edd67f5043952bd
    +  0x00077da0 64333262 30663761 00000000 011b033b d32b0f7a.......;

Most of other diffs may be a result of exactly this difference, because they are (all?) just shifted offsets.

mkow commented 1 year ago

BuildInfo

Or rather NT_GNU_BUILD_ID?

sbellem commented 1 year ago

Just in case that this can help: https://medium.com/nttlabs/bit-for-bit-reproducible-builds-with-dockerfile-7cc2b9faed9f.

My understanding was that bit-for-bit reproducible builds with docker was not even possible. The above link shows that it now is, with docker BuildKit v0.11, but nevertheless mentions that:

BuildKit v0.11 supports bit-for-bit reproducible image builds, but it still needs very complex Dockerfiles for eliminating non-determinism of the timestamps and the package versions.

BuildKit v0.12 will require less complex Dockerfiles for deterministic timestamps, assuming that https://github.com/moby/buildkit/pull/3560 will be merged in v0.12.

The package versions can be pinned using repro-get: decentralized & reproducible apt/dnf/apk/pacman. It still needs huge improvements though, especially for the user experience of maintaining the hash files.

As an aside, up until now, we have used nix instead of docker for bit-for-bit reproducible builds, e.g.: sgx-sdk

fnerdman commented 1 year ago

Thanks @sbellem for chiming in and providing your insight.

I think it is important to clarify that we're not trying to reproduce the container image itself, but the actual gramine binaries. For this, docker is perfectly fine and we don't need to care about timestamps and other issues that stem from creating reproducible OCI images.

I've thought about how to pin the installed packages to a specific version and I think repro-get is an awesome tool. Thanks for suggesting this.

However, I it would be a more sensible approach if the Gramine devs could agree to use a nix environment for their binary releases.

sbellem commented 1 year ago

@lead4good Cool, you're welcome!

I think it is important to clarify that we're not trying to reproduce the container image itself, but the actual gramine binaries. For this, docker is perfectly fine and we don't need to care about timestamps and other issues that stem from creating reproducible OCI images.

Ah yes, that was also my assumption. My understanding was that with docker, the management of dependencies was tricky, such that pinning the entire dependency graph was cumbersome if at all possible. But when I was looking into this more closely I concluded that nix was better suited.

If I find the time I will try building gramine with nix.

woju commented 1 year ago

However, I it would be a more sensible approach if the Gramine devs could agree to use a nix environment for their binary releases.

It might be easier, but it's not sensible and we won't agree. Nix (the pkg manager) is half-solution to reproducibility like Docker, that is, we don't want to rely on special environments to achieve some form of reproducibility, because what about people who don't want or can't use those special environments (for example because they're packaging for another distro). The problems with reproducibility should be fixed where they are, i.e. in the source and/or in the buildsystem, for the benefit of all people who are building Gramine. IOW it should be use just our buildsystem to achieve reproducibility in any reasonable environment where you can pull specific versions of dependencies.

This is also a issue of trust: you'll need to trust a set of nix packages on top of packages provided by distribution (assuming you're not using NixOS, where they're the same). It should be possible to use packages only from (supported) distro, because we strive to minimise TCB.

That said, nothing prevents you from doing reproducible builds of Gramine in nix. But that won't be the project's strategy.

lonerapier commented 1 year ago

Thanks @mkow, for looking into the diffs. My guess is this is mostly due to dependency version changes.

Here's a .buildinfo for you (sorry for .txt suffix, github doesn't accept arbitrary files): gramine_1.4_amd64.buildinfo.txt

Best is to pin the dependencies to the older ones, as mentioned in the buildinfo that was shared by @dimakuv earlier. Can you help me in identifying which are the dependencies for libsysdb.so binary from the buildinfo?

I also wanted to compare with a more recent package, can you point me to one?

sbellem commented 1 year ago

IOW it should be use just our buildsystem to achieve reproducibility in any reasonable environment where you can pull specific versions of dependencies.

You mean meson? From https://mesonbuild.com/Reproducible-builds.html:

Meson aims to support reproducible builds out of the box with zero additional work (assuming the rest of the build environment is set up for reproducibility). If you ever find a case where this is not happening, it is a bug. Please file an issue with as much information as possible and we'll get it fixed.

I did not know that reproducible builds could be achieved with meson. If so, why is this a discussion then? Just trying to better understand the situation here.

In any case, I did start working on building gramine with nix, just for the fun of it. If anyone is interested feel free to reach out.

woju commented 1 year ago

I mean meson and everything that we put in meson.build files. It's one of the ingredients that matters, but there are others.

"Reproducible builds" is not something that you can "achieve" by just applying a tool. It's a state of the project that needs to be kept. It's something similar to the statement that "the project can always be compiled" (== whichever commit from master branch you pick, it's buildable).

For example, consider __FILE__ macro: if you're using it, then it might happen that the full path to the sources gets embedded in the resulting binary. It then follows that such file can't be built reproducibly if you change the path to the repository checkout. No amount of meson will fix that. Container-oriented solutions like docker or nix just ensure that the build paths are always the same. The fix lies in the project: use __FILE_NAME__ on newer compilers, or there's also a switch to gcc that strips some prefix from __FILE__ I think, which you need to put somewhere into meson.build files. (Or maybe that switch was for debug info, I don't remember exactly).

Depending on the specific reproducibility subproblem, there can be a simple tool to fix (like dh_strip_nondeterminism), it might be enough to flip some switch, or you might need to rethink your whole approach. It varies.

lonerapier commented 1 year ago

@mkow @woju, can you help me to get a more recent package from CI/ your machine? so that I can compare with all the recent dependencies.

Also, how should I identify specific dependencies that are required for libsysdb.so binary? As per my understanding, we need to pin dependencies to the exact version as used in previous build to keep the build reproducible.

Please correct me if I am wrong somewhere.

mkow commented 1 year ago

I'll leave answering this to @woju ;)

woju commented 1 year ago

Also, how should I identify specific dependencies that are required for libsysdb.so binary?

They're listed in meson.build files, specifically here's the list of all files and dependencies: https://github.com/gramineproject/gramine/blob/3c5272f49cc4476c9847616eab38381095f30e04/libos/src/meson.build#L133 Also you need to consider toolchain, which of course also affects the binary result, but it's not explicitly listed there.

As per my understanding, we need to pin dependencies to the exact version as used in previous build to keep the build reproducible.

You generally do not. Compiler and linker probably need an exact version, but for usual dependencies there can be some range of versions that would result in identical ELFs.

WRT recent debs, we can think of something.

sbellem commented 3 months ago

Hi, wondering what is the status on of providing reproducible builds for gramine. Thanks!

dimakuv commented 3 months ago

Hi, wondering what is the status on of providing reproducible builds for gramine. Thanks!

I believe no specific actions towards verifying that Gramine builds reproducibly were done. Gramine should satisfy reproducible builds, but we don't have e.g. a CI pipeline that would verify this property for every commit of Gramine (at least not yet).