sigstore / rekor

Software Supply Chain Transparency Log
https://sigstore.dev
Apache License 2.0
858 stars 162 forks source link

Question: should Rekor canonicalize the Rekor entries it takes in? #1162

Open znewman01 opened 1 year ago

znewman01 commented 1 year ago

Right now, Rekor signs the representation of the Rekor entry as-provided. Most Rekor entries are JSON, so there's no canonical encoding.

The current implementation of the Bundle format relies on reconstructing the Rekor entry type in order to check the signature in the SignedEntryTimestamp. However, this is a huge pain—one client might produce different JSON from another, which means that verifying is a pain.

Questions:

CC @codysoyland @bobcallaway

bobcallaway commented 1 year ago

Right now, Rekor signs the representation of the Rekor entry as-provided. Most Rekor entries are JSON, so there's no canonical encoding.

That is incorrect, rekor canonicalizes the entry before writing it to the log and after verifying the validity of the signature.

https://github.com/sigstore/rekor/blob/05d92d3c8e306a6032a56742a42127a82a1d0773/pkg/types/entries.go#L146

The current implementation of the Bundle format relies on reconstructing the Rekor entry type in order to check the signature in the SignedEntryTimestamp. However, this is a huge pain—one client might produce different JSON from another, which means that verifying is a pain.

Questions:

  • Should we be relying on the bytes of the Rekor entry getting passed in for verification? (that is, it would be in the Sigstore bundle; stay tuned for a link to a protobuf-specs issue about this)

    • This would solve the immediate problem but still feels like a hack.
  • Should we consider having Rekor canonicalize the JSON entry types? This is a can of worms because JSON canonicalization is a nightmare—there are many competing JSON canonicalization formats and some aren't actually JSON.

    • Would this be a breaking change? It's very possible that a client would assume that Rekor signs the bytes it's given.
  • Should we try to move later iterations of Rekor entry types to a different format that's more canonicalization-friendly (e.g. COSE)?

CC @codysoyland @bobcallaway

znewman01 commented 1 year ago

Oh huh, I didn't realize that.

It seems like there is a canonicalization step, but it often involves using Go's .json.marshall. which isn't actually canonicalization. So I think my overall point stands—should we be canonicalizing the JSON as part of the Canonicalize step?

bobcallaway commented 1 year ago

The type specific code does call json.Marshall, but is wrapped by this call:

https://github.com/sigstore/rekor/blob/05d92d3c8e306a6032a56742a42127a82a1d0773/pkg/types/entries.go#L151

which uses github.com/cyberphone/json-canonicalization/go/src/webpki.org/jsoncanonicalizer

bobcallaway commented 1 year ago

There are many other canonicalization topics to consider, including PKI artifacts (DER vs PEM encoding), PGP binary VS armored keys, etc.

bobcallaway commented 1 year ago

Simplifying the input and output formats has been on my mind (triggered by much of the bundle dialog), so would love to collaborate on a doc / proto on this to help simplify (as well as to document this)

asraa commented 1 year ago

(that is, it would be in the Sigstore bundle; stay tuned for a link to a protobuf-specs issue about this)

Does this mean that Sigstore bundle would contain the canonicalized Rekor entry (and that way it can compute the correct entry body when validating the SET)? Honestly, I think that's the only way. Otherwise clients would need to recreate the canonicalization logic individually. Is that a really big issue? If you tamper with the bundle, expect it to be invalid.

reconstructing the Rekor entry type in order to check the signature in the SignedEntryTimestamp

Why not just store the Rekor entry type as is?

In https://github.com/kommendorkapten/cosign/blob/bundle_verification/specs/dotsigstore_bundle/verify.md#input, I do see:

Recreate the Rekor entry from the bundle and blob (if provided).

But I'd expect a sub-field of the bundle to be the Rekor entry payload itself. Honestly, that's a good reason for Rekor to directly serve you the bundle-stuff.

EDIT: Note that there are some signature-validation properties required. Validate the SET of the rekor entry in the bundle AND also validate that the signature matches the one in the bundle. The signature is what we care about for timestamp validation.

codysoyland commented 1 year ago

Does this mean that Sigstore bundle would contain the canonicalized Rekor entry (and that way it can compute the correct entry body when validating the SET)? Honestly, I think that's the only way. Otherwise clients would need to recreate the canonicalization logic individually. Is that a really big issue? If you tamper with the bundle, expect it to be invalid.

I agree 100%. This week, I have been taking a stab at writing a bundle verifier following the spec in https://github.com/sigstore/protobuf-specs/ and I got SET verification working for a subset of my test bundles, but it's very error-prone. For example, the intoto v0.0.2 entry expects a user-provided hash over the DSSE (the hash field). This hash is not provided in the bundle so must be computed on the DSSE. This means that the Rekor entry must be created with a hash over a canonicalized DSSE, which is not enforced, so clients must agree to always hash the canonicalized DSSE. I will open an issue related to this in protobuf-specs.

bobcallaway commented 1 year ago

I realized that the requirement for the user to compute a hash was present in intoto:0.0.2 when researching #1164 and this is not desirable behavior, as the server should be computing this.

bobcallaway commented 1 year ago

Also, would appreciate @codysoyland @kommendorkapten @asraa @bdehamer feedback on this (as an example for a broader doc):

In trying to resolve #1164 and #1139, we'll need to add another version of the intoto type. I've started work on this based on the following draft schema, which I think is an improvement on both v001 and v002 in that it tries to more clearly segment what should be sent to Rekor VS what is written to the log and returned to users when fetching the entry:

bdehamer commented 1 year ago

envelope is specified as a string (to contain JSON) since depending on the server to marshal it in the exact same order isn't safe, and this allows Rekor (server-side) to compute a SHA256 digest over the string

This would need to be done in conjunction w/ some sort of change in the bundle format too, right? At the moment, the envelope is not stored as a string in the bundle so it would be difficult to match an offline-computed hash to the server-computed one.

znewman01 commented 1 year ago

This would need to be done in conjunction w/ some sort of change in the bundle format too, right? At the moment, the envelope is not stored as a string in the bundle so it would be difficult to match an offline-computed hash to the server-computed one.

Yup, see https://github.com/sigstore/protobuf-specs/issues/9

kommendorkapten commented 1 year ago

To answer the question if Rekor should canonicalize the entries first. I think yes, but in a much simpler and robust form than what it does today, as this has been proven hard to verify unless the exact Rekor bundle is kept, which I think is not the desired end-state.

I would suggest something like this: The signature (SET) is performed over the PASETO PAE (Pre Auth Encoding) over the following fields:

As PASETO PAE only accepts string the signature has to be encoded. Just for reference, DSSE originally relied on PASETO PAE, but as there was always only two components (payload type and the payload) they rolled to their own, simpler PAE which is pure ASCII, but pretty much otherwise similar to PASETO. For a bit of analysis of that we can read this good blog post by Soatok.

When dealing with PAE like this, the order is significant, but I don't see this as an issue. This can be documented. The output of PAE is a byte stream. As the PAE encoding is very easy to reconstruct, I don't see a reason that we need to either return it or store it (we of course need to store and return the calculated signature (SET)).

I don't think we need to include the digest of the real payload, nor the key or certificate during PAE construction. I would argue that they are implicit via the signature over the artifact.

edit: clarified that we need to store and return SET. edit2: clarified more.

kommendorkapten commented 1 year ago

In trying to resolve https://github.com/sigstore/rekor/issues/1164 and https://github.com/sigstore/rekor/issues/1139, we'll need to add another version of the intoto type.

I think the proposed new schema makes sense, transfering the envelope as a string should simplify things.

This mapping between the signature within the DSSE envelope and the public key provided by the client might be awkward for a client to manually recreate when trying to verify the bundle

Maybe, and consider a scenario where the envelope has multiple signatures, but the client only have access to one key. In this scenario, the client would not be able to recreate the Rekor entry. Unless the keys are part of the response from Rekor, then they could be added to the sigstore bundle, similar to how the certificate chain are stored. If this is done, we must be very careful to *not" use this key material during verification of the artifact's signature, only for reconstructing the Rekor entry.

Until we have a better idea on how to improve the general canonicalization model for Rekor, I'm inclined to agree with @znewman01 and @codysoyland on including the unmodified Rekor entry into the bundle. I also believe this is a hack though :) This would require a robust validation test, as I believe there was a bug found by @codysoyland where the Rekor entry was verified but it was not verified that the entry matched the provided artifact. Requiring the client to reconstruct the entry would close this class of bugs.

bobcallaway commented 1 year ago

Maybe, and consider a scenario where the envelope has multiple signatures, but the client only have access to one key. In this scenario, the client would not be able to recreate the Rekor entry. Unless the keys are part of the response from Rekor, then they could be added to the sigstore bundle, similar to how the certificate chain are stored. If this is done, we must be very careful to *not" use this key material during verification of the artifact's signature, only for reconstructing the Rekor entry.

The keys are part of the canonicalized entry (see https://github.com/bobcallaway/rekor/blob/10d63e565bf035791d6794ef8ba6a41ea013bff9/pkg/types/intoto/v0.0.3/intoto_v0_0_3_schema.json#L43)

kommendorkapten commented 1 year ago

The keys are part of the canonicalized entry

Yes, but they need to be delivered to the client too, right? So the client can reconstruct the Rekor entry (I'm assuming the client does not have access to the Rekor entry). The sigstore bundle can not carry this information as it's written today. Or am I mixing things up here?

asraa commented 1 year ago

). The sigstore bundle can not carry this information as it's written today. Or am I mixing things up here?

Right: I think so. I think it seems like:

  1. The Sigstore bundle right now does NOT contain the rekor entry payload. So clients may not be able to reconstruct the total entry if multiple sigs because the rekor entry contains all of them and maybe they only have one that's relevant. They also have difficulty canonicalizing.
  2. The Sigstore bundle COULD contain the whole Rekor entry canonicalized. Then clients will need to do the additional check to make sure the Rekor entry applies to the artifact -> this could be hard.
  3. The Sigstore bundle COULD be simplified to only have an easy to form "signature" payload like mentioned above. We still need some logic around what counts as a "signature". Just the signature payload inside a DSSE envelope signatures[x].sig? For a JAR signed object, the whole signature file? I think we could do this. It's still complicated, but less than (1) EDIT: AND if this happens in conjunction with ripping out types or simplifying the number of types allowed, then this would be easier.