Open andaaron opened 7 months ago
I assume the
MUST be equal to the client provided digest
is the value of the reference. The clarifications needed are:
- In case the reference is a digest, and it is using a non-canonical digest algorithm, how can it be equal to the Docker-Content-Digest header value which is always using the canonical digest algorithm?
I'd consider removing "canonical" from the Docker-Content-Digest
references. That would allow it to match the client provided value.
- In case the reference is a tag, is there a way for the server to know what algorithm it should use to track this manifest?
I don't think there is, clients get the registry server preferred value.
Is the registry supposed to only use the canonical algorithm in this case? That could cause issues if the client pushes another manifest referencing the initial manifest by a digest computed using a non-canonical algorithm.
Is the registry supposed to compute all possible hashes, using all registered algorithms, of a given manifest (or blob in general) in order to make sure any possible future references can be successfully resolved?
I think this hints at some of the many waiting issues we'll face whenever someone tries to use anything other than sha256. I expect a lot of things will break the day someone tries to switch to sha512. In addition to parts of the spec not covering all the scenarios, I expect registries, runtimes, and other client tooling, are all lacking full support.
cc: @ktarplee
I am going to assume that we want to be able to copy an image from one registry to another without the manifest ID changing. That is the digest of the manifest does not change during a copy and thus none of the digest algorithms for the blobs may change as well. We have that property now and I am proposing we keep it. This implies a few things:
Docker-Content-Digest
references and replacing it with the digest algorithm provided by the client. So Docker-Content-Digest
always uses the client's digest algorithm, and fails otherwise.In regards to the question "In case the reference is a tag, is there a way for the server to know what algorithm it should use to track this manifest?". I think the solution has to allow the client to provide the digest algorithm but since this is a manifest I think it should be the entire digest. Given the above argument, I would say that both of the suggested options are not desirable:
- Is the registry supposed to only use the canonical algorithm in this case? That could cause issues if the client pushes another manifest referencing the initial manifest by a digest computed using a non-canonical algorithm. - Is the registry supposed to compute all possible hashes, using all registered algorithms, of a given manifest (or blob in general) in order to make sure any possible future references can be successfully resolved?
I am proposing that the client provide the expected digest (while uploading a manifest by tag) by either adding a header to the request Docker-Content-Digest: sha256:deedbeef...
or to the query parameters ?digest=sha256:deedbeef...
. The registry can validate that the manifest matches the digest and returns that same digest provided by the client (not necessarily the canonical digest) in the response header Docker-Content-Digest
. If the client does not provide a digest when uploading a tag then the canonical digest is used (so everything is backwards compatible).
The key here is the client must always provide the digest algorithm. The registry can never pick one it wants unless the client gives up it's right to specify the digest algorithm for manifest or blob.
Another aspect of this to think through is the referrers API. When the list of referrer descriptors is returned, what digest algorithm should be used? Does it return duplicate content (i.e. two descriptors to the same content but different algorithm) or just one. Is the the registry expected to de-duplicate those references? Do you limit it to the just descriptors using the same digest algorithm (I don't think so).
flowchart TD
A[Manifest A\nsha256:3...] -->|subject\nsha256:0...| C[Manifest C\nsha256:0...\nsha384:1...\nsha512:2...]
B[Manifest B\nsha256:4...\nsha512:5...] --> |subject\nsha512:2...| C
In the above diagram the digests in the manifests are the digests that the client used to upload that manifest in that registry.
Imagine a client makes a request to /v2/<name>/referrers/sha256:0...
. What should be returned by the registry?
Imagine a client makes a request to /v2/<name>/referrers/sha384:1...
. What should be returned by the registry?
sha384:1...
One solution/rule is that the referrers API should return all known references to objects referred to by the provided descriptor. So in the above example, both manifest A and B should be returned when querying for sha256:0...
or sha384:1...
or sha512:2...
. This is harder for registries to implement because it requires them to realize that sha256:0...
, sha384:1...
and sha512:2...
are actually the same manifest.
Alternatively the rule can that only the manifests with the subject matching exactly are returned. So in the example above, referrers of sha256:0...
would be manifest A, and referrers of sha512:2...
is manifest B. And there are no referrers for sha384:1...
even-though they are the same actual manifest. In this case, registries can effectively treat the manifests as unrelated entities. We do loose some functionality in this case by not picking up all referrers to a manifest. I slightly prefer this approach.
I am proposing that the client provide the expected digest (while uploading a manifest by tag) by either adding a header to the request
Docker-Content-Digest: sha256:deedbeef...
or to the query parameters?digest=sha256:deedbeef...
.
The query parameter would align nicely with the blob put. That would be my preference.
Alternatively the rule can that only the manifests with the subject matching exactly are returned. So in the example above, referrers of
sha256:0...
would be manifest A, and referrers ofsha512:2...
is manifest B. And there are no referrers forsha384:1...
even-though they are the same actual manifest. In this case, registries can effectively treat the manifests as unrelated entities. We do loose some functionality in this case by not picking up all referrers to a manifest. I slightly prefer this approach.
I tend to prefer that as well. Registries could treat each digest algorithm as a separate list of entries in the blob store, so pushing the same manifest with two different digest algorithms would be two separate CAS entries. That also aligns with the storage model of the OCI Layout. Without that, registries would need to compute multiple hashes for every item received, and make the content available by multiple CAS names, which I expect is problematic and creates a significant overhead on large registries to support new algorithms. I doubt registries want to recompute the digest on all of their content given the understandable push back we saw for including referrers responses that were previously pushed by the fallback tag when registries enable the referrers API.
In addition to supporting the push of manifests by tag with a non-canonical digest algorithm, I think we need similar support when a blob is pushed with the digest only being provided after the content is pushed (in the POST, PATCH, PUT). For that scenario, the client would only know the algorithm when the POST and PATCH requests are being run. Perhaps a ?digest-algorithm=sha512
URL parameter should be used in those scenarios?
The image spec mentions multiple registered digest algorithms (https://github.com/opencontainers/image-spec/blob/v1.1.0-rc5/descriptor.md#digests), out of which SHA256 is the canonical one.
The distribution spec mentions these registered digest algorithms can be used to reference manifests when: A) pulling - https://github.com/opencontainers/distribution-spec/blob/v1.1.0-rc3/spec.md#pulling-manifests
B) pushing - https://github.com/opencontainers/distribution-spec/blob/v1.1.0-rc3/spec.md#pushing-manifests
Case A) seems relatively clear to me, but in case B) the expected behavior of the registry is not entirely clear to me, as the digest (and the implicitly the algorithm) may or may not be provided by the client.
I assume the
MUST be equal to the client provided digest
is the value of the reference. The clarifications needed are: 1) In case the reference is a digest, and it is using a non-canonical digest algorithm, how can it be equal to the Docker-Content-Digest header value which is always using the canonical digest algorithm? 2) In case the reference is a tag, is there a way for the server to know what algorithm it should use to track this manifest?If this topic has already been discussed or documented, please provide a link, I could not find relevant issues here.
Thank you