opencontainers / distribution-spec

OCI Distribution Specification
https://opencontainers.org
Apache License 2.0
814 stars 202 forks source link

Can we assume all manifests and indexes are always pushed using the /manifests API in case of multiarch images? #550

Open andaaron opened 2 weeks ago

andaaron commented 2 weeks ago

Assuming we have a "root" index containing multiple manifest references (of media type manifest, in a more practical example a multi-arch image).

What is the proper way to upload the image metadata? A) All the files containing manifests are pushed using the /manifests API, and the index is pushed afterwards using the same API. B) Just the "root" index is pushed using the /manifests API, after all the other files have been pushed using the /blobs API. In both cases the "root" index would be valid, because all references have previously been pushed, and can be identified server-side using their respective digests regardless of which API was used to push.

There are implications in following calls to pull or delete manifests, if for example the manifest was pushed using the /blobs API, not the /manifests API, the server is not aware that file is a manifest, so it can't be exposed to be downloaded/deleted using the /manifests API. They would only be available through the /blobs API.

On server side implementation, can we assume all the clients use either A) or B)?

Silvanoc commented 2 weeks ago

I am used to see A), but apparently B) also exists... Therefore I would tend to consider A) the proper way.


I wonder if there is a reason why manifests cannot be handled as blobs at will. That way A), B) and even a combination of both would be possible.

Is it the root of the artifact? Then handle it as a manifest and manage it with the /manifests endpoints.

It isn't? Then handle it as a blob and manage it with the /blobs endpoints.

Manifests uploaded over the /blobs endpoints can be promoted anytime to /manifests.

That way you would not need to differentiate between A and B.

tianon commented 2 weeks ago

I think this might be one of those cases where the spec talks about/implies garbage collection slash the DAG, but doesn't explicitly say it. Without having an unambiguous way to identify "an object that might have children" it's hard to implement validation or garbage collection in a sane way that doesn't involve heuristics.

sudo-bmitch commented 2 weeks ago

I suspect, given the way content addressable stores work, that some registries may allow a manifest to be pulled as a blob. However, the reverse would be less likely since registries track manifests for GC and media type headers.

My individual opinion is that manifests and blobs may be treated separately by registries, and every entry in an index should be a manifest that is only accessed via the manifest API. While some tools could be written that understand their content is accessed using the blob API, that would break more generic tooling that inspects, ingests, scans, and transports the content. Also a manifest listed in an index may be its own independent object, pulled directly, without going through the index, or it may be listed in multiple indexes, so I have a hard time calling any one index or image manifest the "root".

I am aware that exceptions to this that have occurred in the past, but I consider that a bug. Support for a buggy implementation is a decision for the individual registries and tooling, and not something OCI should be pushing. E.g. I wouldn't want an OCI conformance test to check if a registry allows an index containing references to content pushed to the blob API.