sigstore / sigstore-python

A Sigstore client written in Python
https://pypi.org/p/sigstore
Other
222 stars 47 forks source link

Support for DSSE signatures over binary payloads #982

Open laurentsimon opened 5 months ago

laurentsimon commented 5 months ago

The current API supports only intoto statement, and we'd like support for arbitrary DSSE blobs.

We discussed this in the past but the original DSSE issue has been closed, so creating this issue for tracking.

woodruffw commented 5 months ago

Thanks for filing @laurentsimon! I tweaked the issue title slightly so that I don't forget that this is about arbitrary blobs wrapped in a DSSE envelope 🙂

laurentsimon commented 5 months ago

Is there anything I can do to help with this feature?

woodruffw commented 5 months ago

If you want to take a stab at implementing it, that would be great!

Otherwise, I don't have any immediate bandwidth to try my hand at it 😅

laurentsimon commented 5 months ago

Would the change consist in updating https://github.com/sigstore/sigstore-python/blob/main/sigstore/sign.py#L192 by allowing input_: dsse.Statement | bytes, dsse_type: str (where bytes would be the payload and dsse_type is the type of the payload) + update https://github.com/sigstore/sigstore-python/blob/main/sigstore/dsse.py#L207 accordingly?

For verification, we'd add an optional dsse_type as argument to _verify.

Or do you have another idea?

woodruffw commented 4 months ago

Would the change consist in updating https://github.com/sigstore/sigstore-python/blob/main/sigstore/sign.py#L192 by allowing input_: dsse.Statement | bytes, dsse_type: str (where bytes would be the payload and dsse_type is the type of the payload) + update https://github.com/sigstore/sigstore-python/blob/main/sigstore/dsse.py#L207 accordingly?

That sounds pretty close to what I was thinking, although I have a (minor) concern that exposing the DSSE payload type like this will be misuse prone (e.g. users marking JSON as binary or vice versa). Do you have a concrete need for a payload type other than application/octet-stream or similar?

(Or as a related question: is there a fixed DSSE payload type for binary data already that we should use? I couldn't find one with some searching, but I might have missed it 😅)

For verification, we'd add an optional dsse_type as argument to _verify.

IIUC, this shouldn't be necessary: _verify takes an Envelope, which will have the payload_type you specify.


To take a step back a second: I realized that I'm not actually sure the rest of the Sigstore ecosystem even supports non-JSON DSSE bundles yet.

CC @haydentherapper: do you know if Rekor would accept these? I seem to recall there being a requirement that DSSE entries contain in-toto statements within Rekor itself.

Edit: this is the check I was thinking of: https://github.com/sigstore/rekor/blob/92cf4d5d3682ffcbf4138341f1bc5f7a66766549/pkg/types/dsse/v0.0.1/entry.go#L150-L156

laurentsimon commented 4 months ago

application/octet-stream

Our use case is model signing for multiple files in folder, so would be something like application/vnd.ai-model-manifest+json or maybe more generic application/vnd.path-manifest+json @mihaimaruseac to comment

mihaimaruseac commented 4 months ago

Oh, I haven't thought at all about this :(. Yes, I think something like application/...+json is what we'd need to have, but I'll have to think a little bit more about this

haydentherapper commented 4 months ago

Yep, json is all that Rekor supports currently, though we could support more than that. It means that other clients have to understand how to canonicalize non-JSON DSSEs, which I'm not sure if any would currently.

laurentsimon commented 4 months ago

Thanks all. For model signing, it will be json format.

CC @haydentherapper: do you know if Rekor would accept these? I seem to recall there being a requirement that DSSE entries contain in-toto statements within Rekor itself.

@haydentherapper can you confirm non-intoto DSSE payloads work with rekor?

haydentherapper commented 4 months ago

No, they don't currently. Rekor only understands intoto payloads - https://github.com/sigstore/rekor/blob/main/pkg/types/dsse/v0.0.1/entry.go#L113 and https://github.com/sigstore/rekor/blob/main/pkg/types/dsse/v0.0.1/entry.go#L159

laurentsimon commented 4 months ago

We had an offline chat with @haydentherapper and @mihaimaruseac to brainstorm. What came out is that Sigstore would like to not be in the business of litigating addition of new hashes that users may want to use. This mean (Hayden, Mihai please correct me if wrong) that https://github.com/sigstore/sigstore-python/issues/1018 can be closed in favor of this issue?

@haydentherapper mentioned that hashrekord (already supported in this library) could be a way forward, and we'd update https://github.com/sigstore/rekor/blob/92cf4d5d3682ffcbf4138341f1bc5f7a66766549/pkg/types/dsse/v0.0.1/entry.go#L150-L156 to accept a new DSSE payload type.

laurentsimon commented 4 months ago

/cc @susperius

laurentsimon commented 4 months ago

@haydentherapper can you explain why rekor needs to be aware of the hash type used in intoto subject? Why is the hash type used for DSSE signing not enough? I chatted with @susperius and this question came up. Thanks

haydentherapper commented 4 months ago

(catching up from being out) It's part of the canonicalized structure. The service computes the payload hash based on the provided payload - https://github.com/sigstore/rekor/blob/main/pkg/types/dsse/v0.0.1/entry.go#L247-L251

susperius commented 4 months ago

Sorry if I've missed the point.

So the service computes the hash based on the provided payload that's fine but does not explain the need to restrict the in-toto Subject's DigestSet to the predefined hashes. AIU rekor only sees the already encoded payload so the digest inside of the subject can be anything and wouldn't have an adverse effect on rekor.

PLMK what I'm missing here

woodruffw commented 4 months ago

I think Rekor's restriction might be better suited for discussion on their issue tracker, but for the sigstore-python side: my justification for restricting the DigestSet to a fixed set of (well-known, strong) digests is that a signature is only as strong as the image it's been bound to. So, for sigstore-python's part, restricting the digests to things that are at least as strong as SHA2/256 is a way to ensure that users don't ascribe strength to a signature over a weaker hash function.

(That being said, I think the hash function discussion is unrelated to the top-level issue here, which is DSSE over a binary payload, rather than an in-toto statement. DigestSet is only relevant for the latter I believe, not the former.)

haydentherapper commented 4 months ago

For Rekor, I don't believe we have restrictions on the DigestSet. I've been looking over the code and don't see where this restriction is present. Do you have an error message from calling the API with an intoto statement with non-sha256 digests?

susperius commented 4 months ago

It was just something that came up while talking to @laurentsimon about using in-toto attestation for model signing. The imposed checks on the sigstore side make it difficult to add "custom" hashing information (chunked sha256 for example) as the digest. Overall, that is the reason for DSSE signatures over binary payloads. Perhaps we could add functionality to disable the digest check?

laurentsimon commented 4 months ago

Was discussed in https://github.com/sigstore/sigstore-python/issues/1018