Consider wording and level for Reproducibilty requirement

Some ecosystems struggle to produce bitwise-identical artifacts for multiple runs of the exact same toolchain (discussion). We should consider the wording of the requirement for reproducibility and the SLSA level at which it sits for SLSA v1.0. Specifically, we should consider if there are alternatives to reproducibility that achieve the same security goals for SLSA and if there are alternative measures of reproducibility beyond bitwise idempotency of output artifact which could satisfy those goals. Beyond codifying the specification, we should also consider ways to incentivize efforts for community build toolchain maintainers to achieve bitwise idempotency for which there is no other apparent incentive.

Thanks for raising this, I certainly think it's worth some thought – achieving reproducible builds for large pipelines of diverse inputs is non-trivial. The ActiveState case must be close to the extreme example :-)

I don't have anything concrete to suggest yet, but in the interest of stimulating the discussion I believe the first question we should answer is: what security properties are reproducible builds providing?

The key security property that reproducible builds provide is that they remove the need to trust the build system, because we can compare the outputs from multiple independent builders and verify whether they match.

Is this a reasonable requirement at SLSA Level 4, when so much of SLSA is rooted in trusting the build service? Would it make more sense at a higher level? To me verifiably reproducible builds make sense at a higher level (than L4), so perhaps in terms of the SLSA ladder having optional reproducible builds at some level (currently L4) makes sense. With the requirement becoming non-optional at the same level as verifiable reproducible builds?

(By the way, on the topic of incentivising build toolchain maintainers; the reproducible-builds project has been creating standards, tools, and toolchain patches to support bit-for-bit reproducible builds for several years: https://reproducible-builds.org/)

Fundamentally, verified reproducible builds eliminate the need to have total trust in any one build system. Verified reproducible builds mean that an attacker had to subvert all the verifiers OR it's okay, and that users with source code can even re-perform the verification themselves. Attacks on build systems are not an idle concern, see SolarWinds' Orion.

@joshuagl - it's not insane, it's even been briefly discussed. One "easy" solution would be to create a "SLSA 5" with reproducible builds. Of course, once you have verified reproducible builds, some of the stronger build requirements aren't nearly as important, because you aren't putting all your eggs in one "impregnable build system" basket.

There's a broader discussion about trusting the build system so much. Google spends an enormous amount of money to create & control their build systems, and it is their own systems, so it makes sense that they're willing to trust their build system so much. But not everyone is in the same boat. Many other organizations simply don't have the enormous amount of money to rebuild their own CPU boards & create their own data centers, and they may be concerned about trusting someone else's system that much. Even end-users may be suspicious of a build; if the build isn't reproducible there isn't really a practical alternative for verification of the build process. For them, reproducible builds are a sensible approach. So you're seeing some tension between smart people from different organizations who have different circumstances.

Reproducibility isn't something people have worked to do, so it does typically require work. It certainly shouldn't be required at low SLSA levels. There has been progress. Here is some data from reproducing Java's maven central: https://github.com/jvm-repo-rebuild/reproducible-central

I agree that this needs more clarification. Here's a dump of thoughts.

For the question of whether reproducible builds is possible in various scenarios, my suggestion is to handle the non-deterministic pieces as yet another input to the build, which we have to trust. So:

Attached signing (e.g. Authenticode)
- Model as two separate builds.
- First build is the unsigned binary, which is fully reproducible.
- Second build is the signed binary. The original builder signs and then pulls out sufficient information for others to reproduce the process, given two inputs: the original unsigned binary and the signature itself. In this way, rebuilders can verify that the only change was to attach the signature, not to modify any other bits of the binary.
- (Multiple signatures can be supported as a series of such signing operations.)
Timestamps, machine state, non-determinism etc.
- Ideally remove all these things, as per https://reproducible-builds.org
- An acceptable compromise would be to pull sufficient information from the original build to allow others to reproduce, given these non-deterministic inputs. We'd still have to blindly trust these inputs, and it's a larger attack surface, but my feeling is that this is still a significant improvement from the status quo.

For the question of what value reproducible builds buys you, I see several benefits:

It is a means by which we can construct a SLSA 4 build system that everyone agrees on, at least for open-source. It seems unlikely that every organization will agree on a single set of "trusted" builders. One organization might trust Foo but not Bar, while another might trust the opposite. By having a system of independent rebuilders, the consumer can effectively choose whom to trust. Without reproducible builds, the consumer's only option is to trust whatever service the producer chose.
Furthermore, this allows us to build a service that is more trustworthy than any individual service. An individual service is a very complex system with a large attack surface, and insiders often have unilateral access to tamper with the build system (even if unintended). By requiring builds from at least two independent systems, we effectively eliminate all insiders who have unilateral access — an attacker would have to compromise at least two different systems. That said, requiring multiple builders adds more complexity and cost, so I'm hesitant to require it outside of open-source. (I can see an argument for a future SLSA 5 for this benefit.)
Reproducible builds has reliability benefits outside of security, as noted in the FAQ.

slsa-framework / slsa

Consider wording and level for Reproducibilty requirement #382