opencontainers / distribution-spec

OCI Distribution Specification
https://opencontainers.org
Apache License 2.0
828 stars 205 forks source link

Allow for rejection of image indexes with missing references #310

Closed Jamstah closed 2 years ago

Jamstah commented 2 years ago

Be more specific for missing references in the case of image indexes and not just image manifests.

The intent here is to clarify that a registry is allowed to reject image indexes for which the references do not yet exist in the registry, but does not have to. Currently, all implementations I have seen do validate, but it is not mandated by the spec, nor checked with the compliance tests.

Overall, we have a use case for not performing this check around mirroring of content to local environments where the local environment only has a subset of architectures they need images for. To make this possible, I'd like the spec to say that the validation that the references exist is allowed, but not required.

For more discussion see the dev list: https://groups.google.com/a/opencontainers.org/g/dev/c/Uw8xdBOr444

sudo-bmitch commented 2 years ago

Bringing my comments over here because they aren't focused on a single registry implementation.

How should tooling handle an index that points to missing manifests? Example issues I can think of include:

Jamstah commented 2 years ago

Great questions, thanks!

I'll give my answers here, but I'm also going to paste this onto the opencontainers dev list post I made for further discussion: https://groups.google.com/a/opencontainers.org/g/dev/c/Uw8xdBOr444. This PR is really to accommodate behaviour that already exists in registries, which I'm hoping is not controversial.

I feel like either making it a MUST or MUST NOT would be backwards incompatible, but making it a MAY just reflects that it's something that is happening and that consumers should be aware of.

Onto the questions:

vulnerability scanners: should they pass an image that all manifests are unavailable with the assumption those other manifests will never be uploaded, or fail those scans since a vulnerable manifest for another platform could be uploaded after the scan result reports a success?

I would suggest an index that has unscannable content be treated the same as an image that has missing blobs. I guess that would be an "error: incomplete" or something along those lines. The individual images could (and should) be scanned. I'm not sure what scanners are doing with indexes at the moment anyway, do they give them a result? Is it based on a combination of all the platforms?

image copy/mirroring tools: what if we attempt to copy a spare index to a registry that doesn't allow them? Will this break mirrors and local caches?

From a high level, I think not. Official sources for images should never be sparse - what would be the point? I would go as far as to say that its probably not a good idea to allow sparse images on repos like docker.io, quay.io, etc. The use case for this is in the mirrors themselves, where if you want to set up a mirror for your network disconnected servers that are a mix of x86 and arm, you don't want the other platforms for disk space/bandwidth/scanning/other reasons.

From a low level, yes, this may make tools break in funny ways, and they will have to learn to cope. If a tool has been well written, it should already cope, because there is no existing guarantee that indexes won't be sparse. I personally plan to make skopeo first class, and potentially raise bugs/prs for other tools to help move ux along generally.

image metadata: if tooling extends the index with other metadata (e.g. SBoMs and signing) rather than other platform manifest, will this metadata be lost in a sparse copy?

Part of the point here is that a manifest cannot change on copy if it is to retain signatures and the ability to be content-addressable. Therefore, yes, any metadata in the manifest or based on the manifest digest should be kept in a sparse copy.

UX when pulling a missing manifest: if a user attempts to pull an image from a sparse index that doesn't exist, they will get a 404 for a tag that clearly exists from the tag API and likely from any web UI's on top of the registry. Do registries or client tooling need to be updated to give a more useful error?

Potentially, depending on what it does at the moment. Along with skopeo I plan to at least check what docker, podman, cri-o, and maybe some others do, and either raise PRs for better messages or bugs.

SteveLasker commented 2 years ago
sudo-bmitch commented 2 years ago

image metadata: if tooling extends the index with other metadata (e.g. SBoMs and signing) rather than other platform manifest, will this metadata be lost in a sparse copy?

Part of the point here is that a manifest cannot change on copy if it is to retain signatures and the ability to be content-addressable. Therefore, yes, any metadata in the manifest or based on the manifest digest should be kept in a sparse copy.

I was going a different direction with this one. If a tool that copies images only copies the platform images for the platform it wants, then that copy may skip the copy of other index entries that might be an OCI artifact of metadata for the copied image. This is a bit theoretical since I don't think anyone is shipping artifacts that way, yet.

SteveLasker commented 2 years ago

it's interesting. The index referencing other artifacts is another one to ponder. Updating an index with multiple entries suffers the same race conditions, and the inability to subset the promoted content. As with all of these, some concrete use examples help think through the tradeoffs and which goals it meets.

Jamstah commented 2 years ago

image metadata: if tooling extends the index with other metadata (e.g. SBoMs and signing) rather than other platform manifest, will this metadata be lost in a sparse copy?

Part of the point here is that a manifest cannot change on copy if it is to retain signatures and the ability to be content-addressable. Therefore, yes, any metadata in the manifest or based on the manifest digest should be kept in a sparse copy.

I was going a different direction with this one. If a tool that copies images only copies the platform images for the platform it wants, then that copy may skip the copy of other index entries that might be an OCI artifact of metadata for the copied image. This is a bit theoretical since I don't think anyone is shipping artifacts that way, yet.

Ah right, I see what you mean. I think long term this would need the concept of weak/strong references where the strength of the reference could be used to say "this one is always needed", which isn't covered here.

I agree that it's theoretical, and my instinct, for now, is to say that's the job of whoever is doing the mirroring to ensure that the right component parts of any index are mirrored correctly.

Jamstah commented 2 years ago

Updating an index with multiple entries suffers the same race conditions, and the inability to subset the promoted content.

Not sure I follow this one, are you saying that if mirroring a subset meant we would remove the entry from the manifest that there would be a race condition as individual parts were mirrored and added? I feel like I'm missing the point :)

sudo-bmitch commented 2 years ago

Updating an index with multiple entries suffers the same race conditions, and the inability to subset the promoted content.

Not sure I follow this one, are you saying that if mirroring a subset meant we would remove the entry from the manifest that there would be a race condition as individual parts were mirrored and added? I feel like I'm missing the point :)

I suspect Steve is thinking about what this would look like for signing images see this issue and the attached hackmd. Signatures may be added after the image is pushed to the registry, meaning the index would get recreated with a new manifest list. And if you have multiple systems signing (multiple approvers, reproducible builds, etc) there are race conditions where two updates to the manifest list result in one of them being lost.

Jamstah commented 2 years ago

Updates based on @stevvooe comments here: https://groups.google.com/a/opencontainers.org/g/dev/c/Uw8xdBOr444/m/oj5oLroDBAAJ

Jamstah commented 2 years ago

Can you format this as one sentence per line (see https://github.com/opencontainers/distribution-spec/blob/524a53fd55a76b1b035567c445326c0be0d55676/README.md#markdown-style)?

I like it. It feels like a very sensible and coder thing to do, but also, a bit rebellious like we're taking the rules of English on the written page and laughing as we make it feel a bit pythonic.

I went all in.

Jamstah commented 2 years ago

The other option is to push for an invalid index to return a MANIFEST_UNKNOWN or MANIFEST_INVALID error instead, but wouldn't match the current behavior of registry:2.

Yes, I feel that would be backwards incompatible and require a major version change for the spec.

stevvooe commented 2 years ago

Reading through this, it isn't clear to me that an unknown manifest in an index should return the MANIFEST_BLOB_UNKNOWN but that's the error I see when trying this today with registry:2. I'd recommend we expand the description on that error to include manifests

Looking back at when we wrote the original spec, the MANIFEST_BLOB_UNKNOWN error was meant for validating blob presence. MANIFEST_INVALID is more for format or "pure" validation errors. MANIFEST_UNKNOWN is more for when a fetch for a manifest results in a 404.

Jamstah commented 2 years ago

Is there anything else for me to do here? I'm guessing it's just waiting for merge, but don't know if there's a meeting/another process it would be useful for me to be part of.

Jamstah commented 2 years ago

Thank you to everyone who indulged me in this :)