opencontainers / distribution-spec

OCI Distribution Specification
https://opencontainers.org
Apache License 2.0
811 stars 202 forks source link

versioning data for v1.1 with referrers API #365

Closed mikebrow closed 1 year ago

mikebrow commented 1 year ago

The current direction with respect to handling when the referrers API is not supported by the registry needs work..

The diff for 1.1 for referrers states: When pushing an image or artifact manifest with the subject field and the referrers API returns a 404, the client MUST

This results in clients having to request referrers after pushing one of the new manifest types, to make a version/capability determination, or a client would have to have a master list based on registry (and repository?).

What is needed is a version/capabilities check preferably the version of OCI supported should be easily discovered and/or the capabilities should be made available such that a call to referrers is not needed after each push of a new manifest.

@dmcgowan @sudo-bmitch

mikebrow commented 1 year ago

one option would be an extension

sudo-bmitch commented 1 year ago

What's the advantage of an extension or version check over checking if the API works? One efficiency I believe ORAS is looking at is a dummy digest (all 0s) that will always return an empty response (unless someone manages a hash collision).

mikebrow commented 1 year ago

One is explicit, the other is a subjective/reactive response. When the registry oci version is 1.0 and referrers returns 4xx that is the expected response a non 4xx would be an error. When the registry oci version is 1.1 and referrers still returns 4xx it is not expected in this case for registries that claim support, but because the api is only a "SHOULD" it is not mandatory to succeed. When the version of the registry api is docker 1 or 2 and no oci support exists.. what procedure then?

Suggest, not solving the versioning issue with something more explicit will get trickier as we move ahead.

Agree a nil/dummy referrers request with an empty response as a positive ack to supporting referrers is better than asking for a set of expected manifests.. just to check if referrers is supported at all. But it still feels like kicking the versioning can down the git branch tree :-)

sudo-bmitch commented 1 year ago

We included the following to enable discovery:

If the registry supports the referrers API, the registry MUST NOT return a 404 Not Found to a referrers API requests.

A registry without the API will respond with a 404 to these requests, or perhaps a 400 if something is broken. In both cases, the client should fall back.

We avoided a check on the registry version because it felt easier and less error prone to just check the API you want to use. Less error prone because we worried some registries would claim 1.1 compatibility without the API, or perhaps they are a v1 registry with some v1.1 features. And easier because it's one less API to define and support.

mikebrow commented 1 year ago

The first time you try to use a new api that only has one version you can argue existence is proof.. When there are two versions .. that argument looses strength. Forward and backward compatibility of service apis is easier to do when you know what version you are using.

In this case (it seems to me?) the text is arguing assume OCI 1.1 format for GET/PUSH but if referrers is called and 404 not found is returned manually build a 1.1 image index tagged with the 1.1 referrers tag schema and push that to the registry.

The client if afforded the option may wish to not push the 1.1 artifacts or try to call referrers if the registry is known to not support the entire 1.1 specification, and instead tell the user 1.1 artifacts support (including 1.1 referrers requests are not supported) would you like to use the fallback pattern and push the following image index.. or if it does support 1.1 the client could ask/suggest the user to provide an image index tagged with the referrers tag schema because, for now this registry does not appear to support the referrers api and it would be beneficial to have a tagged list for later retrieval.

Alternatively a client may wish to use the OCI 1.0 artifact pattern if the version of the registry is known to not support OCI 1.1 format or OCI 1.1's new referrers api.

Dunno, for me once there are two versions of an API, the code decisions sort of begin with ok what version of the API are we using.

mikebrow commented 1 year ago

The version issue of the manual/auto created image index with the 1.1 referrers tag schema, and how a client should serialize against other clients trying to update the same image index points to another pro/con of having a version, imagine if there was no tag to indicate version and now try to serialize just on digest.

Another issue... What if the registry "upgrades" or "downgrades" support for 1.1/referrers, what if a first client supports referrers and a second client does not. What if a "mirror" supports referrers but the source does not, or vice versa..

sajayantony commented 1 year ago

Irrespective of referrers, caps/version would be a good addition to distribution. Maybe a SHOULD in 1.1 and could be slowly moved upto a Must in the next revision.

mikebrow commented 1 year ago

soln might be to require both modes with/without referrers pattern on artifact push for 1.1 with a deprecation of the without pattern

sudo-bmitch commented 1 year ago

Some of the comments from today's call:

Re the version/capabilities API: personally I wouldn't use it, and instead query the API directly and handling the errors. The risk is the registry may report different capabilities that what actually works with the API. Perhaps the API is throwing 4xx errors because of a broken implementation, or perhaps the API is enabled before updating the capabilities API. If the registry responds with different capabilities than what is actually implemented, then clients will make mistakes, resulting in data loss (not pushing the fall back tag) or unnecessary tags (cluttering the listing when the API works).

Re the pull through cache: this only affects pulls (pushes go to the upstream registry) so it won't cause consistency issues upstream. A potential workaround is for registries to convert a request for a fallback tag to a referrers API request. I don't know that we want to put that in the spec, but I wouldn't be opposed to a registry supporting that with a backwards compatibility flag.

For managed mirrors (a full registry that happens to have a copy of images from another source), the tooling copying the images can automatically handle the upgrade and downgrade, pushing/pulling the fall back tag when the referrers API isn't available.

sudo-bmitch commented 1 year ago

One item discussed in today's call was the need to fail fast on the client side, before authenticating or pushing blobs, when a manifest media type wouldn't be supported on the registry. Registries may still accept a media type according to a capabilities API, and later reject it after the blob was pushed, if that registry is doing additional filtering that can't be communicated in a capabilities API (e.g. rejecting unknown fields, or finer grain validation on the manifest content).

The part I want to avoid is automatically upgrading a client to new functionality at runtime, because an automatic upgrade creates significant portability issues for content. If that was done, once users upgrade the origin registry, all other registries where the manifest may be copied to would also need to be upgraded. It's very common for the reverse to be the reality, the development/build environment registry is upgraded before the production/public facing registry. Perhaps any capabilities API should come with documentation to warn implementations away from generating non-portable content.

sajayantony commented 1 year ago

Using the current set of APIs without a deterministic way to determine the version of registry it has become quite hard to determine if we should use ImageManifest+Index or Artifact+Index or Artifact+Referrers

The question for the maintainers I have is - Are you comfortable to release distribution spec 1.1 without resolving this issue or should this be defined as a part of the https://github.com/opencontainers/distribution-spec/milestone/6

Sharing @toddysm's write up here - https://toddysm.com/2023/01/05/oci-artifct-manifests-oci-referrers-api-and-their-support-across-registries-part-1/

sudo-bmitch commented 1 year ago

How should this be used by clients? If clients query the capabilities API and the capabilities API itself is not available, do they still try to access a feature anyway, or do they assume a registry that doesn't implement the capabilities API hasn't implemented any of the features it would describe? In other words, does this break registries that added the subject/referrers functionality before the capabilities API was defined? And if a registry does implement a capabilities API, how does this impact client error handling if a registry claims to support something but still rejects it later? Something I'd like to avoid is multiple tiers of error handling for the same error, because it introduces the risk of inconsistency for clients.

Also, how long are we comfortable delaying the 1.1 release to define, build, test, and approve a new capabilities API?

sajayantony commented 1 year ago

Before Capabilities or header etc. is there interest in specifying the types so that when you release distribution 1.1 ? https://github.com/opencontainers/distribution-spec/compare/main...sajayantony:distribution-spec:supported-types

I can make this a PR if I folks are ok but this is orthogonal to the caps/header discussion.

@sudo-bmitch @jdolitsky @jonjohnsonjr

jlbutler commented 1 year ago

As I mentioned in last week's call, I think our conversations are potentially conflating more than one concern. Personally I've very open to further discussion, but I think if we continue to pull these into the same frame, we run a risk of not making much progress.

For the sake of seeing some quick-take upvotes or downvotes, I'm going to make two subsequent comments.

jlbutler commented 1 year ago
  1. Provide a mechanism for clients to determine the functionality of a registry they are talking to

To this, we've had a couple of proposals. As this is the first minor version update of the distribution spec (or, any of the specs), it seems like a good time to add something. I know we were leaning toward capabilities vs version, but honestly I think version is simpler and does the job.

While a registry could report 1.0 and also accept Artifacts, or host a referrers endpoint without claiming version 1.1, registries that report 1.1 MUST support all features. This isn't compliance, but that the implementation should be complete. Then, clients decide how to work with that, but it gives the happy path for clients that want to know if 1.1 (including referrers and Artifacts) is supported or not.

In regards to how long we'd delay a release for this, I don't think that adding a version endpoint to the spec as it is would be a significant undertaking. As I offered to write a PR to do that, I'm still good to do that. But I would like some sense of community direction on this issue.

How are folks thinking about this? Yay, nay, needs more discussion?

jlbutler commented 1 year ago
  1. Provide clear guidance to client implementors such that they can make implementation decisions related to feature support and legacy storage

In most specs that I'm aware of that involve a client and server which have the server storing artifacts, concerns around portability relate to existing artifacts being able to move forward into the future and not be stranded. I've never seen a spec take into consideration storing future-version artifacts on a down-rev storage system.

I would really like other folks to chime in here - my experience with multi-version specs isn't really in the cloud native apps space, but in storage protocols and filesystems.

All this said, there is no guarantee nor any requirement that after N number of weeks, months, or years that all registries will support all 1.1 features. Therefore if we don't choose to move on from this concern of up-rev artifacts being stored in down-rev registries, the only solution seems to be to not rev the spec meaningfully and never really adopt Artifacts.

How are we feeling about this being addressed narratively with guidance, maybe in Use cases or even a new section related to versioning (which of course doesn't exist yet). Thumbs up or down here would also be appreciated.

imjasonh commented 1 year ago

How are we feeling about this being addressed narratively with guidance, maybe in Use cases or even a new section related to versioning (which of course doesn't exist yet). Thumbs up or down here would also be appreciated.

+1, more narrative guidance is always helpful, and existing outside the spec gives it more flexibility to improve examples and wording without the usual difficulty of spec language.

sudo-bmitch commented 1 year ago

All this said, there is no guarantee nor any requirement that after N number of weeks, months, or years that all registries will support all 1.1 features. Therefore if we don't choose to move on from this concern of up-rev artifacts being stored in down-rev registries, the only solution seems to be to not rev the spec meaningfully and never really adopt Artifacts.

In a lot of other projects I've worked with, there's a concept of a grace period to upgrade, and a support window that everyone can depend on. With the forced upgrade approach, we're saying as soon as the initial registry is upgraded, all downstream registries are no longer supported if they didn't already upgrade. It's a very user hostile approach that concerns me.

My own approach was going to be a wait and see how adoption goes, and once enough key players transitioned, and self hosted users had time to upgrade, I would change the default. And that would only be a default, users could override that either way. But if there's a concern that no one will support the artifact manifest, a fixed time after the GA release would also make sense.

With the questions raised in this issue, my own questions above haven't been addressed. I'm not comfortable moving forward with a new feature without knowing exactly how we are recommending that feature be used. Most of the spec defines, for each API, how clients call it, and how it is used in a workflow (e.g. a blob push runs before a manifest push).

In this case, we are trying to create a new API without specifying how and why it's needed, and that feels backwards to me. I'd rather define the issue first, work through the possible solutions, pick the solution we like the best, and then define the API that's needed for that solution.

jlbutler commented 1 year ago

Totally agree on 'forced upgrade' @sudo-bmitch, your concerns have convinced me. I was more focused on just setting context here so we can unblock from definitions and discuss a solution. I probably should have been more complete.

By 'forced upgrade', we're talking about clients' automatic adoption of new features present in a registry without user specification. In some contexts we refer to this as upgrade, which is a slightly confusing term, at least for me.

I think these use cases are primarily publisher context, that is, clients creating artifacts. There could be implications for consumers as well, we should be sure to include them if any.

Focusing on publisher for the moment, adoption of new features by clients will play out in at least three different ways. These three are based on clients that I'm aware of, and my understanding of their plans to adopt 1.1+ features.

So I believe the thing we're most concerned about here is whether or not a user acting as publisher specifies, or at least implies, the artifact type (and subsequent use of fallback tag schema as proposed), or if it is automatic.

As a precedent, I think we need to consider this for future features as well. Just as with a capabilities endpoint, we've not had to consider this sort of thing before. The first time we're introducing a new minor version seems as good a time as any to do it.

There are maybe more use cases, but does this help clarify? Cause more concern?

If these are all laid out in use cases or elsewhere, would that address concerns from a spec author point of view? Or would we need to add opinionated guidance to accompany these?

sudo-bmitch commented 1 year ago

For context, we did include an upgrade path in Proposal E: https://github.com/opencontainers/wg-reference-types/blob/main/docs/proposals/PROPOSAL_E.md

jonjohnsonjr commented 1 year ago

This results in clients having to request referrers after pushing one of the new manifest types, to make a version/capability determination, or a client would have to have a master list based on registry (and repository?).

I believe https://github.com/opencontainers/distribution-spec/pull/379 solves this concern.

jdolitsky commented 1 year ago

solved in #379