opencontainers / distribution-spec

OCI Distribution Specification
https://opencontainers.org
Apache License 2.0
823 stars 202 forks source link

[Maintainer vote] Is requiring registries to accept non-existent subjects a breaking change? #490

Closed jdolitsky closed 10 months ago

jdolitsky commented 10 months ago

Note: Please do not comment on this issue. Please contain all discussion on this topic to https://github.com/opencontainers/distribution-spec/issues/459.

This topic had been discussed in depth in https://github.com/opencontainers/distribution-spec/issues/459, https://github.com/opencontainers/distribution-spec/issues/483, in-person at KubeCon NA `23, and in various OCI meetings over the last few months. This issue is to get a final vote from the acting distribution-spec maintainers on the following question:

Is requiring registries to accept non-existent subjects a breaking change?

Maintainer Vote
@sudo-bmitch no
@dmcgowan yes
@jzelinskie no
@jonjohnsonjr no
@jdolitsky no
@mikebrow yes
@stevvooe yes
@vbatts no

Maintainers: please edit this issue description to add your vote. Please use one of the following answers:

dmcgowan commented 10 months ago

Note: Please do not comment on this issue. Please contain all discussion on this topic to https://github.com/opencontainers/distribution-spec/issues/459.

I stated before I don't want the political discussion there but to keep it about proposed solutions.

This does not need to be a winners and losers issues, every use case can be supported. I think requesting a vote pointing to a long and hard to follow discussion is not fair to the maintainers.

jdolitsky commented 10 months ago

@dmcgowan The intention was to keep this issue as a completely non-biased vote amongst maintainers. I understand if we want to discuss someplace outside of #459, but I don't think the discussion belongs here.

stevvooe commented 10 months ago

I'm generally of the opinion that if we reference an object in another object, the object must exist in the registry. We've built the image spec and distribution implementations with this concept from the beginning. It prevents a lot of mistakes in pushing content that could otherwise be exposed to users.

The enforced order ("DAG order" we'll call it), would be the following for a push that follows this invariant:

  1. images and layers.
  2. image-index.
  3. signature.

It sounds like the benefit of relaxing this ordering is to allow one to push a signature or SBOM before the image-index is pushed, so that when one tries to pull the new image, the attachment is guaranteed to exist.

Is it worth relaxing the fundamental design principle of the registry to solve this particular problem? I think we need a better statement of impact before we drop a design principle that has gotten us this far.

I'm voting "Yes", but with the statement that we need to make it a MUST to validate the presence of targeted subject. I am not aware of how this will cause problems with existing implementations on the client side, but understand if they need to relax adherence. If there is a clear use case that can't be done with out relaxing this validation, I'm generally open to hearing about it (and acknowledge that I may have missed it in prior discussion).

jdolitsky commented 10 months ago

Since the levee has broke and it looks like we're having the discussion here, I'm at least going to lock it and limit to maintainers 😅 carry on

sudo-bmitch commented 10 months ago

I'm voting "Yes", but with the statement that we need to make it a MUST to validate the presence of targeted subject.

That would be a breaking change to me. The spec says registries MAY validate the content, there's nothing existing in the spec that says any one field in a manifest MUST be validated. There was a push to support sparse manifests that required little in the spec to change because we already supported them. Pushing things in DAG order is useful for consistency, but the perception of what that order is varies based on the perception of whether this is a single DAG or two DAGs associated by an API.

The direction I've been pushing for is to acknowledge that this is a new field with different usage from the existing fields, where different behavior is appropriate. We do not give out the reference to a signature to runtimes and have them pull the image from the subject field. Consumers pull the image, and expect the content associated with that image to exist, including the layers and config, and now we're adding the signature, SBOM, and potentially other metadata. For implementations that want to avoid breaking this logic on their consumers, it would be good for the spec to give them the ability to push content in either order.

Compared to OCI image-spec+distribution-spec 1.0, adding new validation to this field is a breaking change. Content that a user could push to an OCI conformant registry today will not be able to be pushed to a registry after is upgrades to the 1.1 specs with the added validation, since the OCI spec requires that unknown fields do not result in errors from registries and consumers.

stevvooe commented 10 months ago

The original spec and implementation returned an error for each missing blob: https://distribution.github.io/distribution/spec/api/#pushing-an-image-manifest. This seems to have been relaxed in https://github.com/opencontainers/distribution-spec/commit/a92b62ee17c3a89e3d18fd1288e6f7547a9696d5. It sounds like subject should fall into the existing MAY language and we should move on.

I don't see a valid reason to allow upload of invalid or out of order content. The signature, SBOM and most other metadata cannot exist until an image has a digest and that usually happens during the push preparation phase.

mikebrow commented 10 months ago

@stevvooe Additionally, beyond the order issue, there is a thought that some registries and mirrors may never hold the images or layers being referenced by subject, and will instead generate a new image index (via referrers api) with a set of objects having particular object subject references... Essentially a new registry type..

I wholly agree with the point of making it a MUST to validate the presence of a targeted subject at least before these objects are allowed to be pulled.. and also recognize the enforced DAG order as vital for image registries.

One possible soln would be language in the spec requiring at least one of two patterns (traditional DAG order required for repos that only allow present images before referring artifacts can be pushed/pulled, and un-traditional presence variants with some other validation implementation.)

Let's see what Derek comes up with, surely there are many options to resolve the impasse.

jdolitsky commented 10 months ago

I don't see a valid reason to allow upload of invalid or out of order content.

@stevvooe this was captured as a requirement at the onset of the reference types working group (in March 2022). See item 3 here: https://github.com/opencontainers/wg-reference-types/blob/main/docs/REQUIREMENTS.md#content-management

"As a user, I want to push an artifact that references another artifact that doesn't exist on the registry yet."

stevvooe commented 10 months ago

As a user, I want to push an artifact that references another artifact that doesn't exist on the registry yet.

But why? What does this actually allow one to do that can't be done with current model.

It also seems at odds with this requiremet:

As an artifact producer, I want to push multiple artifacts concurrently (possibly from separate systems) without encountering race conditions or other errors.

If I can push a signature that references a non-existing artifact, I've created a race condition in the referrers API that didn't previously exist, which may never be resolved.

sudo-bmitch commented 10 months ago

As a user, I want to push an artifact that references another artifact that doesn't exist on the registry yet.

But why? What does this actually allow one to do that can't be done with current model.

From previous discussions, it allows the signatures to be pushed before the image so that consumers never attempt to pull an unsigned image. It also allows metadata to be stored in a separate repository from the images they extend (I've heard the example of a security team that wants to push metadata to a restricted repository that only they can access, without the need to mirror every image, that does require tagging the metadata, and client tooling needs to be configured with an alternate repository to query the referrers API).

It also seems at odds with this requiremet:

As an artifact producer, I want to push multiple artifacts concurrently (possibly from separate systems) without encountering race conditions or other errors.

If I can push a signature that references a non-existing artifact, I've created a race condition in the referrers API that didn't previously exist, which may never be resolved.

I'm not sure I follow the question. That requirement was looking at the issue with concurrently pushing multiple platforms for an image from different pipelines and not having a way to avoid race conditions from the different pushes modifying the index simultaneously. If multiple jobs each push a different SBOM, having the referrers response generated by the registry gives the opportunity to resolve the race condition on the server. Without that we'd need to require registries to support conditional requests for clients to be able to implement it.

A race condition between the signature and image being pushed, if the signature is pushed first, feels very similar to the race condition between pushing the layer and the image manifest when the client is hoping the layer doesn't get GC'd before they push the manifest. At least from the client perspective, I understand the server may see that very differently.

vbatts commented 10 months ago

I vote "no" Reasoning being there was slight precedent in the concept with the remote non-distributable layers. As for the use case, I'm personally in favor of being able to have a subject that is not required to be on the same registry.

stevvooe commented 10 months ago

How is splaying an image and its associated content across registries is good for users?

It seems like with a MAY here applied to subject, as it is applied to other references, is the correct language, while still allowing these use cases to exist on outside registries, which can choose to not validate the presence. This preserves the baseline behavior and allows registries that want to break up content to not implement the validation.

dmcgowan commented 10 months ago

Reasoning being there was slight precedent in the concept with the remote non-distributable layers.

This is not a good feature to point at since it is the only part of the spec that has lead to serious security issues. Registries today are also allowed to reject manifests with urls, I would also consider it a breaking change to say a registry must accept all urls.

As for the use case, I'm personally in favor of being able to have a subject that is not required to be on the same registry.

That is the inverse of this change (MUST vs MUST NOT), we are only considering MUST vs MAY. I am also in favor of the spec supporting this use case though. The current language says a registry MUST support a subject which is not in the registry (whether that is because it is on another registry or will come later is not know during upload). That is the breaking change, since registries today MAY require a referenced object to exist locally before accepting the content.

jdolitsky commented 10 months ago

How is splaying an image and its associated content across registries is good for users?

I want to point out that this isn't just theoretical and is already happening today. Please see this section in the cosign project README, where COSIGN_REPOSITORY allows storing/locating image signatures in a separate registry. People are successfully using this feature today.

There are a number of real use-cases. For example, a user may be in a highly-regulated environment with strict rules on where certain data can be stored.

If it isn't guaranteed via the MUST language that a registry will support a manifest with a subject in a separate registry, then it is highly likely that cosign will likely NOT adopt the OCI 1.1 spec to maintain this functionality. The result of this will be the continued proliferation of sha256-* digest tags. This is one of the primary things that the WG set out to eliminate.

dmcgowan commented 10 months ago

If it isn't guaranteed via the MUST language that a registry will support a manifest with a subject in a separate registry, then it is highly likely that cosign will likely NOT adopt the OCI 1.1 spec to maintain this functionality. The result of this will be the continued proliferation of sha256-* digest tags. This is one of the primary things that the WG set out to eliminate.

Can you explain this one a little further. If a tool is trying to push up referrers to a a registry which does not have the subject, how does MAY stop that? The MAY could just signal to the client that this registry requires the referenced content to be pushed up as well. So if say Docker Hub was the original source and your reference registry was the target, then the reference registry could support subject manifests without the subjects even if Docker Hub did not. If the worry is that the client may come across an attempt to push referrers to a registry which requires content, then the client can either push the content or reject the upload. Not implementing it all doesn't make sense since the currently language already says clients MUST implement the 1.0 fallback.

mikebrow commented 10 months ago

With the current language in the release candidate / main branches of image and distribution.., yes the MUST language as written here in dist is a breaking change.

As the subject reference is a descriptor, a quick check shows this MUST in dist breaks with the image spec https://github.com/opencontainers/image-spec/blob/main/descriptor.md for at least five descriptor rules.

Additionally there are a few (at least) broken rules here in the dist spec because of this change and more that need clarification.

Further, the term MUST "accept" here in dist is not well defined, as the registry MAY accept then garbage collect immediately after calling into question the meaning and depth of the MUST for the use case.

sudo-bmitch commented 10 months ago

I never explained my own vote. I'm voting "no" because:

If there's a way we can resolve this in a way to registries and clients are both happy, I'd jump at the opportunity. Since we've been discussing this for 3 months with no solution in sight, I'm not confident more discussion will find that solution, so I'm voting with the community to move forward.

mikebrow commented 10 months ago

the question being voted on is the wrong question... IMO..

Should there be use case support for OCI registries having the option to receive and store artifacts having a subject descriptor reference to a portion of a merkle dag index/image that it is not a part of [x] yes (in this yes the language should be clear this is a new type of PUSH api having different rules and possibly requiring different registry indexing and garbage collection.) Should a registry be allowed to reject or at least subsequently garbage collect these artifacts if they are not placed in a dag that includes the subject descriptor referenced object and/or have not been validated? [x] yes

Should there be force language here stating that the registry MUST validate subject descriptors (repeating the current rule in image) or optionally provide an indication of validity for artifacts not in a portion of the index/image dag [x] yes

jzelinskie commented 10 months ago

IMO, I don't think this is a breaking change.

I view the referrers as the root of a separate tree with its own validation semantics that are purposely distinct from images. I do believe that the DAG guarantees raised by @stevvooe are an important part of the design for images and a discussion around sparse images is coming, but is orthogonal to the definition of "valid" for referrers' subjects.

To reject this now would break relied upon functionality that was also a part of the original design requirements for referrers.

mikebrow commented 10 months ago

Gronking the point that a manifest with a subject descriptor field that references a manifest that does not exist MAY be "the root of a separate tree with its own validation semantics purposely distinct from images." Noting that those words are implied by the MUST language but are not written down in the spec.

That same manifest MAY also be added in such a way that is not distinct from images and still adheres to the DAG guarantees.

The MUST accept language reads, to me, as forcing the choice and appears to be contradicting to the existing specs.

jonjohnsonjr commented 10 months ago

I think the question is moot, regardless...

My interpretation is that it's impossible for this to be a breaking change because prior to the subject field existing, this line (that got removed in https://github.com/opencontainers/image-spec/pull/1030... accidentally? why was that merged?) wouldn't allow them to reject it.

jonjohnsonjr commented 10 months ago

Gronking the point

I have no idea what this means.

mikebrow commented 10 months ago

I think the question is moot, regardless...

My interpretation is that it's impossible for this to be a breaking change because prior to the subject field existing, this line (that got removed in opencontainers/image-spec#1030... accidentally? why was that merged?) wouldn't allow them to reject it.

How would they process referrers requests without adding support for the new fields.

Agree the unknown fields and media types MUST be ignored part should be put back.. the additional clarification and not generate an error is ok I think.

grok.. typo

mikebrow commented 10 months ago

@jonjohnsonjr if your interpretation of "A registry MUST accept an otherwise valid manifest with a subject field that references a manifest that does not exist, allowing clients to push a manifest and referrers to that manifest in either order" is the unavailable fallback path described here because they must ignore the field.. well ok.. but it doesn't say that and the current wording of the fallback doesn't support that interpretation.

jdolitsky commented 10 months ago

Thanks everyone for voting and sharing your thoughts. The result of the vote is 5 "no" and 3 "yes". I am closing this issue.