Comprehensively describe how verification works

arewm commented 11 months ago

In the gDoc for independently verified reproducible build requirements, @MarkLodato and I started discussing verifying builds. The comment thread is too hard to follow, so I am creating this issue to track the discussion further

Since the context thread is long, I'll start off by hiding it all in a details block. If certain parts become more relevant, we can un-hide all or part of the block

Context: > The proposal is to add: > * L4: Add (at least) independently-verified semantically equivalent build > * L5: Add (at least) independently-verified reproducible build @MarkLodato started a comment thread (the rest of the quotes are responses in the thread): > My preference would be to make this part of some recommendations or clarifications on acceptable implementations of SLSA, rather than a level. This doesn't seem like a property of the build itself, but rather how you verify. Checking bit-for-bit identical is always OK; allowing "semantically equivalent" depends on context and is usually but not always OK. For example, ignoring a timestamp in an archive is OK if and only if the timestamp has no bearing on how the artifact is used. @arewm: > The desire to make this as part of a recommendation leads me along the same path that this doesn't fit within the specification of the build track itself. I don't personally see how we can map reproducibility goals to SLSA implementation targets as you can easily have reproducibility without meeting any of the SLSA build levels. > > What is the benefit of making this a notion of acceptable implementations instead of specific levels that can be "achieved" in its own track? > @kpk47 , FYI @MarkLodato: > Andrew, my thinking is that this is purely a detail of how you verify. You have an artifact A, some metadata (attestation) M, and a policy P. You want to know if A complies with P, using M as evidence. There are three steps: > 1. Authenticate metadata M (e.g. by checking a signature) > 2. Verify that metadata M applies to artifact A (e.g. by checking the hash) > 3. Verify that metadata M satisfies the policy P > > It is step 2 that we're talking about here. Normally you just check that M contains the hash of A. But what is desired is the case when M contains the hash of A', and you want to say that A and A' are "close enough". @arewm: > Thanks for that. I feel like this type of verification works against the first guiding principle -- Trust platforms, verify artifacts. > > In order to improve the efficiency of artifact verification, we should be able to trust platforms and trust that those platforms have correctly associated metadata M with an artifact A. > > In my opinion, the value of reproducible builds comes into play when a consuming entity does not trust a platform (either by circumstance or choice). If the platform is not or cannot be trusted (i.e. it hasn't been hardened), then having a means to verify the artifact will be highly valuable. @MarkLodato: > Hmm, IMO this is yet more evidence that we need to more comprehensively describe how verification works. That seems to be the real source of disconnect. I tried to write up a response here but it's too complex of a topic. I'll work with Kris to put something more comprehensive together. > > In the meantime, I don't think there's a conflict in what I described. Let's take a very simple example that has nothing to do with reproducible builds. > Step 1. You build a Chrome extension, which is really just a zip file. This generates provenance containing the hash of the zip file (A). > Step 2. You sign and upload the Chrome extension to the web store, which modifies the zip file and changes the hash (B). The files inside the zip are unchanged. > Step 3. You download the extension from the web store (with hash B) and want to verify that it matches our original provenance (with hash A), ignoring any benign changes due to the signing and uploading process. > > This is exactly the "semantically equivalent" case. We're saying that A and B are "semantically equivalent" for some definition. In this case, you have a few options, such as: > 1. In Step 3, deterministically "undo" the transformation from Step 2, so that you can get from B back to A. > 2. In Step 2, record the changes that were made and propagate this list of changes alongside the artifact. Then in step 3, first verify that all of these changes are "benign" and then proceed as above, undoing the changes and verifying that you got A. > 3. In Step 1 and Step 3, use hash of a normalized version of the artifact that is agnostic to the zip encoding, such as only recording the file contents but not the zip structure itself. This is what go does, for example.

@david-a-wheeler, FYI since you were the author of the document; @kpk47, FYI since you were pulled into the thread.

arewm commented 11 months ago

The situation with the chrome extension where the hash changes after uploading falls into an area that currently has no applicable requirements for the build track -- the package ecosystem. The case of verification and semantic equivalency might very well be useful when handling package ecosystems which are not compatible with these types of attestations (i.e. ones that are tied to the subject/its digest).

Should there be some requirements imposed on package ecosystems to make verification easier? For example, they can either not modify the digests or they could provide a VSA that might also summarize how to re-verify the original digest.

Semantic equivalent and reproducible builds are beneficial when some part of the supply chain does not conform to a the build track's specification. My comments in a different thread in the gDoc were trying to highlight the benefit of a dependency/reproducibility track when conformance to a build track is not an option. Is there a way that we can handle situations where part of the supply chain might be conformant to some specification? This seems like an anti-pattern to me which is why I was suggesting a new track.

For clarity,

The original comment was was only tied to the first line:

L4: Add (at least) independently-verified semantically equivalent build

Do you envision the recommendations on verification would only be associated to these semantically equivalent builds or also reproducible builds?

MarkLodato commented 11 months ago

Could you come up with an example to help the discussion? I'm having a hard time picturing. Like if the Chrome extension example is good, could you phrase things in terms of that? Or if that's a bad example, could you come up with something else?

In particular, you're talking about "some part" of the supply chain, but I can't envision what you mean. If instead you said, "Suppose PyPI package X was built from dependency Y, and Y [...]", that would help me.

Thanks!

slsa-framework / slsa

Comprehensively describe how verification works #1011