sigstore / model-transparency

Supply chain security for ML
Apache License 2.0
104 stars 28 forks source link

Support for converting manifests to in-toto statements #248

Closed susperius closed 1 month ago

susperius commented 1 month ago

Description

Tracking bug for manifest to in-toto statement conversion.

In order to sign and store model signatures in a sigstore bundle the manifest must be converted to an in-toto statement.

mihaimaruseac commented 1 month ago

Included in my draft work, will send a draft PR

laurentsimon commented 1 month ago

I've had some thoughts about the intoto format. In https://github.com/sigstore/model-transparency/issues/111, we expressed concern over listing each file as intoto subjects, because tooling like cosign can be used to (wrongly) verify only a subset of subjects. Here's an idea that may be able to address this concern:

  1. We list the (file, (custom?) hash) inside the predicate, not the intoto subject
  2. We output a single subject in the intoto subject list by serializing the list of (file, hash), like we currently do in our existing PoC. We need not serialize json (which requires canonicalization and is often a source of break across implementations), but a simple format like base64(path).hash-value as shown here. Other metadata for each file need not be serialized but are still protected by the manifest signature. Serialization schemes may vary, so we'll need a way to convey this as well. There are 2 ways to achieve this: a. In subject, by changing the hash scheme scheme-sha256. This is a bit harder to support with sigstore-python, unless we build the protobuf ourselves. b. We add the scheme type under the predicate. I lean towards this solution.

So overall it would look like the following:

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "predicateType": "https://something/model-signing/v1",
  "subject": [
    {
      "digest": 
      {
        "sha256": "blabla"
      }
    },
  ],
  "predicate": {
     "serialization": "some-name/v1",
     "files": {
           {
              "path": "to/file1",
              "digest": 
              {
                 "sha256": "bla",
              }
           },
          {
              "path": "to/file2",
              "digest":
               {
                 "custom": "bla",
              }
           },
      },
   }
}
susperius commented 1 month ago

I agree with you. It only feels like "abusing" the format which is mostly because sigstore has hard coded the acceptable hashes in their code base.

Perhaps it would be the best to go down the route of our own format to avoid additional headache. So we can serialize the manifest to an on disk format and utilize sigstore's hashed record. We then store the bundle and manifest together and be done with it.