sigstore / model-transparency

Supply chain security for ML
Apache License 2.0
105 stars 28 forks source link

Add in-toto format for items with hash of hashes #264

Closed mihaimaruseac closed 1 month ago

mihaimaruseac commented 2 months ago

Summary

Note: This is an experiment serialization, one of the 4 in a series of PRs (#264, #265, #266, #267). Before a stable release of the library, we would standardize on an ergonomic format, with as little corner cases / dangerous corners as possible.

This converts model serialization manifests that record every model file hash into an in-toto payload that can then be passed to Sigstore's sign_intoto for signing to generate a Sigstore Bundle (if using Sigstore).

To identify the models, we compute a hash of all hashes of the files and use that as the subject. The individual file hashes are used as the payload and we would have the verifier check them as part of the verification process.

CC @susperius for converting manifest to in-toto. This should cover #111, #224, and #248 (first part of the machinery). CC @laurentsimon and (optionally) @TomHennen to make sure I did not mishandle in-toto.

Note: This builds on #263. I decided to split every feature into its own PR to make it easier to review what changes (should be only the last commit) and to be able to merge partial work and continue from there.

Release Note

NONE

Documentation

NONE

TomHennen commented 1 month ago

I think my biggest concern with this proposal (and #265, #266, #267) is that it doesn't allow for any other predicates to be made for models in the future. It can only sign things, but cannot convey any additional information with that signature (things like what types of data this model was trained on, etc...).

It sounds like the primary reason for this approach is that Sigstore cannot support other hash types in Rekor. To me that seems like the more important problem to solve (but I don't understand the technical problems with doing so).

mihaimaruseac commented 1 month ago

Right, these are just for signing, though we can extend the payload to contain more information, I think