Open laurentsimon opened 2 weeks ago
@susperius thoughts?
I think we also need some lower level API for incremental hashing.
I think the serializer API takes care of it, it takes as input recompute_paths
, more generally it takes as input whatever we decide in https://github.com/sigstore/model-transparency/issues/160.
For the hash engine: does it take as input a file to hash or a memory string? Having shard
and chunk
signals a file-based API, but update
and final
signal a memory string, reading from other place.
For the hash engine: does it take as input a file to hash or a memory string? Having
shard
andchunk
signals a file-based API, butupdate
andfinal
signal a memory string, reading from other place.
In #188, I moved shard
to be only informative here. Working on the next level of API where shard
is actually relevant.
If we kept shard
at the hash-an-object level where we would have implemented the multi-process queue, then we could have gone to a state where too many threads are started.
How is the hash_engine expected to be used? Will the serializer call update() and final() methods on the hash_engine? If that is the case, why does the hash_engine need to know what the shard size and chunk size is? I am sure I am missing something.
Good observation, I think @mihaimaruseac had a related comment in https://github.com/sigstore/model-transparency/issues/172#issuecomment-2122898110
I think it depends what level of abstraction we're talking about: the underlying (traditional) hash engine (sha256, etc), the shard-aware hash, something else. The shard-aware hash engine would use a lower-level hash engine under the hood (eg sha256). It may also depends if we're talking about streaming engine or not.
https://github.com/sigstore/model-transparency/pull/188 has moved out the chunk size and shard out of the engine for now so we'll see if the original API def is needed or not. PTAL we'd love your feedback too
Will the serializer call update() and final() methods on the hash_engine?
That is correct. I'll implement this in #190 (WIP at the moment, but should push a new change tomorrow-ish).
If that is the case, why does the hash_engine need to know what the shard size and chunk size is? I am sure I am missing something.
Chunk size doesn't matter, see #188. Shard size matters, but more just for verifications.
Based on the PKISigner
and PKIVerifier
examples, it looks like we're looking to add support for both keyed and keyless signing flows, and I think we should. If so, the PKISigner
and PKIVerifier
classes need to account for supplying the necessary keys. Not sure how official this API spec is, but just wanted to call it out JIC.
The keyless flow will be handled by SigstoreSigner
/ SigstoreVerifier
(with optional fulcio / reko parameters for private developments). The PKISigner
/ PKIVerifier
is exclusively for "private" PKI deployment, ie existing PKI using existing "off the shelf" CAs or custom CAs (possibly using TUF-managed keys). See a PoC for the PKI https://github.com/sigstore/model-transparency/pull/177. We'll need to support TUF-managed and HSM keys: the API in the issue description has a simple private key
bytes input but I think we'll need something like a KeySigner
/ KeyVerifier
interface instead (providing raw sign / verify). /cc @udaysavagaonkar @susperius
There may be a case for supporting sigstore keyed flow, but I don't think we're prioritizing it atm.
Not sure how official this API spec is, but just wanted to call it out JIC.
Nothing official, just the overall direction :) Feel free to comment.
+1, the proposal in the first message here is mostly informative, as you see in #188/#190 things are changing. For #190 I still need to find way to incorporate the current serialization format, but I have some ideas on that, which I'll test by next week.
Any comments and suggestions for improvement, as well as PR reviews, are very very welcome
This issues proposes what the (long-term) APIs will look like. Looking for comments, nothing set in stone.
1. Hash engine
Tracked in https://github.com/sigstore/model-transparency/issues/140 Why:
We will provide default hash engine that we use in this library, with the possibility to customize its parameters:
A hash name is parameterized. I suggest something simple like
<name>$param1$param2...
. For the existing sha256p, it could besha256pv1$1000000
for a shard of 1 GB.2. Serializer
This will serialize a model (folder or file). Why: some callers may want to serialize models using our library but not sign it with our library.
3. Signer / Verifier
This will create a generic signer class that can be instantiated for Sigstore, PKI, etc.
Example for Sigstore:
Example for PKI:
4. model sign / verify
This is the main entry point for callers to use.