sigstore / model-transparency

Supply chain security for ML
Apache License 2.0
102 stars 27 forks source link

Manifest file format #224

Open susperius opened 3 weeks ago

susperius commented 3 weeks ago

I wanted to reignite the discussion around a manifest file format. So, I tried to define two potential solutions as shown below. The first part shows a simple solution to support a Merkle tree root hash and the per file hash approaches in an either / or fashion. Here, we could also extend the root_hash field to a custom type if not all necessary information can be captured in the metadata section. The second part shows a solution that would support a mixed approach. I'm not sure it is necessary but just wanted to highlight it.

In both cases metadata is a simple dict[str, str] type that allows users to add arbitrary metadata to the Manifest. This could also be extended to support a strongly typed key, value solution if necessary.

The Manifest could be packaged in a DSSE envelope to follow their standard procedure for signing or (if sigstore is going to allow arbitrary data) we add a signature field to the Manifest file. I'd prefer going with the DSSE envelope since it's already well defined for this use case.

WDYT?

// Manifest supporting either digests or a root hash
message Manifest {
    map<str, str> metadata = 1;
    oneof model {
        ModelDigests model_digests = 2;
        bytes root_hash = 3;
    }
}

message ModelDigests {
    repeated Digest digests =1;
}

message Digest {
    string method = 1;
    string path = 2;
    bytes hash = 3;
}
// ================================================
// Manifest supporting mixed data
message Manifest {
    map<str, str> metadata = 1;
    ModelInformation model_information = 2;
}

message ModelInformation {
    oneof data {
        Digest digest = 1;
        RootHash root_hash = 2;
    }
}

message RootHash {
    string archive_information = 1;
    bytes hash = 2;
}

message Digest {
    string method = 1;
    string path = 2;
    bytes hash = 3;
}
susperius commented 2 weeks ago

I'm going to prepare a manifest PR. That's based upon PR193 and adds a last modified field to the Digest message.

mihaimaruseac commented 2 weeks ago

I'm currently working on one, to handle internal BCID needs.