Formalize support for zstd compression: v1.1.0 ?

thaJeztah commented 4 years ago

While reviewing https://github.com/moby/moby/pull/40820, I noticed that support for zstd was merged in master (proposal: https://github.com/opencontainers/image-spec/issues/787, implementation in https://github.com/opencontainers/image-spec/pull/788 and https://github.com/opencontainers/image-spec/pull/790), and some runtimes started implementing this;

containerd: https://github.com/containerd/containerd/pull/3649
containers/image: https://github.com/containers/image/pull/563
(in progress) docker / moby: https://github.com/moby/moby/pull/40820 (currently for "extracting" only)

However, the current (v1.0.1) image-spec does not yet list zstd as a supported compression, which means that not all runtimes may support these images, and the ones that do are relying on a non-finalized specification, which limits interoperability (something that I think this specification was created for in the first place).

I think the current status is not desirable; not only does it limit interoperability (as mentioned), it will also cause complications Golang projects using this specification as a dependency; go modules will default to the latest tagged release, and some distributions (thinking of Debian) are quite strict about the use of unreleased versions. Golang project that want to support zstd would either have to "force" go mod to use a non-released version of the specification, or work around the issue by using a custom implementation (similar to the approach that containerd took: https://github.com/containerd/containerd/pull/3649).

In addition to the above, concerns were raised about the growing list of media-types (https://github.com/opencontainers/image-spec/issues/791), and suggestions were made to make this list more flexible.

The Image Manifest Property Descriptions, currently describes:

Implementations MUST support at least the following media types:

application/vnd.oci.image.layer.v1.tar

application/vnd.oci.image.layer.v1.tar+gzip

application/vnd.oci.image.layer.nondistributable.v1.tar

application/vnd.oci.image.layer.nondistributable.v1.tar+gzip

Followed by:

...An encountered mediaType that is unknown to the implementation MUST be ignored.

This part is a bit ambiguous (perhaps that's just my interpretation of it though);

Should an implementation pull a manifest, and skip (ignore) layers with unknown compression, or should it produce an error?
If the +zstd layer mediatype is not in the MUST list, is there any reason for including it in the list of OCI Media Types? After all, any media types not included in the list "could" be supported by an implementation, and must otherwise be ignored.

What's the way forward with this?

Tag current master as v1.1.0, only defining +zstd as a possible compression format for layers, but no requirement for implementations of the v1.1.0 specification to support them
Add the +zstd compression format to the list of required media types, and tag v1.1.0; projects implementing v1.1.0 of the specification MUST support zstd layers, or otherwise implement v1.0.x
Wait for the discussion about "generic" layer types (https://github.com/opencontainers/image-spec/issues/791, https://github.com/opencontainers/image-spec/issues/799) to be completed before tagging v1.1.0
Do a v1.1.0 release (1. or 2.), and leave 3. for a future (v1.2.0) release of the specification.

On a side-note, I noticed that the vnd.oci.image.manifest.v1+json was registered, but other mediatypes, including media-types for image layers are not; should they be?

thaJeztah commented 4 years ago

@jonjohnsonjr @vbatts @mikebrow @dmcgowan @SteveLasker ptal

(not sure if this is the right location for this discussion, or if it should be discussed in the OCI call; I just noticed this, so thought I'd write it down 😬 😅)

vrothberg commented 4 years ago

Should an implementation pull a manifest, and skip (ignore) layers with unknown compression, or should it produce an error?

I had similar issues interpreting "ignore". The containers/image library errored out for a couple of weeks last year, which blew up for @tych0. Now, it allows for pulling and storing the images.

In case of a call, I will do my best to join.

thaJeztah commented 4 years ago

I must admit I'm not the most proficient reader of specifications, but good to hear I'm not the only person that was a bit confused by it 😅 (which may warrant expanding that passage a bit to clarify the intent).

I guess "ignoring" will lead to an "error" in any case, because skipping "unknown media types" should likely lead to a failure to calculate the digest 🤔. Still, having some more words to explain would be useful.

vrothberg commented 4 years ago

Thanks, @thaJeztah! I also felt some relief :smile:

@tych0, could you elaborate a bit on your use case? I don't want to break you a second time :angel:

dmcgowan commented 4 years ago

I'm not sure (3) solves the underlying problem here. That defines a way for understanding the media type, but it doesn't necessarily mean that clients can handle all possible permutations of a media type. The main issue is that if clients start pushing up images with zstd compression, older (most existing today) clients will not be able to use them. With that in mind, making it a requirement and release 1.1 with this change at least makes that problem more explicit and the solution more clear. Any client which supports OCI image 1.1 can work with zstd, older clients might not. I am not sure the generic layer types is really a specification change as much as a tooling change, it may allow the image spec at that point to support more options. The media types supported here should always be explicit though imo.

tych0 commented 4 years ago

@tych0, could you elaborate a bit on your use case?

Sure, I'm putting squashfs files in OCI images instead of gzipped tarballs, so I can direct mount them instead of having to extract them first. The "MUST ignore" part of the standard lets me do this, because tools like skopeo happily copy around OCI images with underlying blob types they can't decode.

If we suddenly change the standard to not allow unknown blob types in images and allow tools to reject them, use cases like this will no longer be possible.

Indeed, the standard does not need to change for docker to generate valid OCI images with zstd compression. The hard work goes into the tooling on the other end, but presumably docker has already done that.

It might be worth adding a few additional known blob types to the spec here: https://github.com/opencontainers/image-spec/blob/master/media-types.md#oci-image-media-types but otherwise I don't generally understand the goals of this thread.

thaJeztah commented 4 years ago

If we suddenly change the standard to not allow unknown blob types in images and allow tools to reject them, use cases like this will no longer be possible.

I think in case of Skopeo, Skopeo itself is not consuming the image, and is used as a tool to pull those images; I think that's more the "distribution spec" than the "image spec" ?

I think a runtime that does not support a specific type of layer should be able to reject that layer, and not accept "any" media-type. What use would there be for a runtime to pull an image with (say) image/jpeg as layer-type; should it pull that image and try to run it?

For such cases, I think it'd make more sense to reject the image (/layer).

tych0 commented 4 years ago

I think in case of Skopeo, Skopeo itself is not consuming the image, and is used as a tool to pull those images; I think that's more the "distribution spec" than the "image spec" ?

No; the distribution spec is for repos serving content over http. skopeo translates to/from OCI images according to the OCI images spec.

I think a runtime that does not support a specific type of layer should be able to reject that layer, and not accept "any" media-type. What use would there be for a runtime to pull an image with (say) image/jpeg as layer-type; should it pull that image and try to run it?

If someone asks you to run something you can't run, I agree an error is warranted. But in the case of skopeo, it is a tool that is perfectly capable of handling layers with mime types it doesn't understand, and I think similar tools should not error out either.

thaJeztah commented 4 years ago

No; the distribution spec is for repos serving content over http. skopeo translates to/from OCI images according to the OCI images spec.

Yeah, poor choice of words; was trying to put in words that Skopeo itself is not the end-consumer of the image (hope I'm making sense).

But in the case of skopeo, it is a tool that is perfectly capable of handling layers with mime types it doesn't understand, and I think similar tools should not error out either.

The confusion in the words picked in the specs is about "mime types it doesn't understand". What makes a tool compliant with the image-spec? Should it be able to parse the manifest, or also be able to process the layers? Is curl | jq compliant?

While I understand the advantage of having some flexibility, if the spec does not dictate anything there, how can I know if an image would work with some tool implementing image-spec "X" ?

Currently it MUST ignore things it doesn't understand, which (my interpretation) says that (e.g.) any project implementing the spec MUST allow said image with an image/jpeg layer. On the other hand, it also should be able to extract an OCI Image into an OCI Runtime bundle. In your use-case, the combination of Skopeo and other tools facilitate this (Skopeo being the intermediary).

For Skopeo's case, even though the mediaType is "unknown to the implementation", Skopeo is able to "handle" / "process" the layer (within the scope it's designed for), so perhaps "unknown" should be changed to something else; e.g.implementations should / must produce an error if they're not able to "handle" / "process" a layer-type.

tych0 commented 4 years ago

e.g.implementations should / must produce an error if they're not able to "handle" / "process" a layer-type.

That seems like a reasonable clarification to me!

cyphar commented 4 years ago

@thaJeztah

Regarding the ambiguity of the MUST clause. The intention of that sentence is to say that implementations should act as though the layer (or manifest) doesn't exist if it doesn't know how to do whatever the user has requested, and should use an alternative layer (or manifest) if possible. This is meant to avoid implementations just breaking and always giving you an error if some extension was added to an image which doesn't concern that implementation -- it must use an alternative if possible rather than giving a hard error. Otherwise any new media-types will cause endless problems.

In the example of pulling image data, arguably the tool supports pulling image data regardless of the media-type so there isn't any issue of it being "unknown [what to do with the blob] to the implementation" -- but if the image pulling is being done in order for an eventual unpacking step then you could argue that it should try to pull an alternative if it doesn't support the image type.

I agree this wording could be a bit clearer though, this change was done during the period of some of the more contentious changes to the image-spec in 2016. Given that the above was the original intention of the language, I don't think it would be a breaking change to better clarify its meaning.

On a side-note, I noticed that the vnd.oci.image.manifest.v1+json was registered, but other mediatypes, including media-types for image layers are not; should they be?

This is being worked on by @SteveLasker. The idea was to first register just one media-type so we get an idea of how the process works, and then to effectively go and register the rest.

cyphar commented 4 years ago

Another issue with the current way of representing compression is that the ordering of multiple media type modifiers (such as compression or encryption) isn't really well-specified since MIME technically doesn't support such things. There was some discussion last year about writing a library for dealing with MIME types so that programs can easily handle such types, but I haven't seen much since then.

SteveLasker commented 4 years ago

On a side-note, I noticed that the vnd.oci.image.manifest.v1+json was registered, but other mediatypes, including media-types for image layers are not; should they be?

This is being worked on by @SteveLasker. The idea was to first register just one media-type so we get an idea of how the process works, and then to effectively go and register the rest.

Ack: please assume the other mediaTypes will be registered. I'm providing clarity in the Artifacts Spec to help with both these issues. Once the Artifacts spec is merged, with clarity on the registration process, I'll register the other types.

For the compression, what I think we're saying is this: Tools that work specifically on a type, for instance runnable images like application/vnd.oci.image.config.v1+json should know about all layer types for a specific version. In this case, v1 vs. v1.1. The spec for each artifact provides that detail so clients know what they must expect. The artifact specific spec might say compression is optional, and a fallback must be provided. But, I don' know if it's realistic to say a tool could push a new layer type without it being in the spec and be considered valid.

There are other tools, like skopeo, (I think) or ORAS which work on any artifact type pushed to a registry. In these cases, they need to know some conventions to be generic. But, in the case of ORAS, it intentionally doesn't know about a specific artifact type and simply provides auth, push, pull of layers associated with a manifest. It's the outer wrapper, like Helm or Singularity that provide specific details on layer processing.

We have an open agenda for the 4/22 call to discuss.

thaJeztah commented 4 years ago

I see I forgot to reply to some of the comments

Regarding the ambiguity of the MUST clause. The intention of that sentence is to say that implementations should act as though the layer (or manifest) doesn't exist if it doesn't know how to do whatever the user has requested, and should use an alternative layer (or manifest) if possible. This is meant to avoid implementations just breaking and always giving you an error if some extension was added to an image which doesn't concern that implementation -- it must use an alternative if possible rather than giving a hard error. Otherwise any new media-types will cause endless problems.

So, I was wondering about that: I can see this "work" for a multi-manifest(ish) image, in which case there could be multiple variations of an image (currently used for multi-arch), and I can use "one" of those, but I'm having trouble understanding how this works for a single image.

What if an image has layers with mixed compression?

extract only those that I "understand" and try to construct a rootfs?
what if I understand all of those compressions? (say, the image has both zstd and gzip compressed layers);
- should I "pick one", and "cherry-pick" all layers with the same compression?
- should I "pick all" layers, extract them, and construct the rootfs?

I think it's technically possible to have mixed compressions. For example, in a situation where an existing image is pulled (using, e.g. gzip compressed layers), and extending the image (add a new layer) using zstd, then pushing the image.

However, the "reverse" could also make a valid use-case, to create a "fat/hybrid" image, offering alternative compressions for systems that support it ("gzip" layers for older clients, "zstd" for newer clients that support it).

Looks like this needs further refinement to describe how this should be handled.

Ack: please assume the other mediaTypes will be registered. I'm providing clarity in the Artifacts Spec to help with both these issues. Once the Artifacts spec is merged, with clarity on the registration process, I'll register the other types.

Thanks! I recall seeing a discussion (on the mailing list?) about registering, but noticed "some" were registered, but others were not, so thought I'd check 👍

justincormack commented 3 years ago

Yes, absolutely agree with Sebastiaan, picking some layers you understand and rejecting the rest is meaningless, and the semantics are not defined. There is no way to construct an image with zstd compression that is compatible with both older and newer clients. This only works for very limited workflows where you synchronously update all your clients and then update the images you generate, it does not work at all for people wanting to distribute public images, for example, where basically you cannot use zstd because there is no way to make an image anyone can use. A manifest list mechanism would be workable, but the current design just doesn't seem fit for purpose, and I think we should revert it.

giuseppe commented 3 years ago

I think the way to move forward is to add support for zstd to the different clients but still keep the gzip compression as the default.

Generating these images should not be the default yet, but the more we postpone zstd support in clients, the more it will take to switch to it.

I don't see anything wrong if an older client, in 1-2 years will fail to pull newer images.

thaJeztah commented 3 years ago

The problem is that currently the correct behavior is effectively "undefined". See my earlier comment about layers using mixed compression (which IMO should be a valid use case). Without any definition how these images should be handled, it would not be possible to keep them interoperable.

On 9 Dec 2020, at 14:37, Giuseppe Scrivano notifications@github.com wrote:

I think the way to move forward is to add support for zstd to the different clients but still keep the gzip compression as the default.

Generating these images should not be the default yet, but the more we postpone zstd support in clients, the more it will take to switch to it.

I don't see anything wrong if an older client, in 1-2 years will fail to pull newer images.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

tych0 commented 3 years ago

What about just adding the clarification you already proposed above, i.e.

e.g.implementations should / must produce an error if they're not able to "handle" / "process" a layer-type.

Doesn't that define it well enough?

thaJeztah commented 3 years ago

Unfortunately, it doesn't, because for runtimes that support both zstd and gzip, selection is now ambiguous.

Take the following example;

{
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 12345,
      "digest": "sha256:deadbeef"
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
      "size": 34567,
      "digest": "sha256:badcafe"
    }
  ]
}

The above would be ambiguous, as it could either mean;

A "fat" single layer image, providing alternative layers in zstd and gzip format (for older clients)
A two-layer image, with the first layer in gzip and the second layer in zstd compression

In the above, 1. is a likely scenario for registries that want to provide "modern" compression, but provide backward compatiblity, and 2. is a likely scenario where a "modern" runtime built an image, using a parent image that is not available with zstd compression.

While it's possible to define something along the lines of "MUST" pick one compression, and only use layers with the same compression, this would paint us in a corner, and disallow use-case 2. (and future developments along the same line).

All of this would've been easier if digests were calculated over the non-compressed artifacts (and compression being part of the transport), but that ship has somewhat sailed. Perhaps it would be possible with a new media-type (application/vnd.oci.image.config.v1+json+raw), indicating that layers/blobs in the manifest are to be considered "raw" data (non-compressed, or if compressed, hash was calculated over the data as-is). In that case, clients and registries could negotiate a compression during transport (and for storage in the registry, compression/storage optimisation would be an implementation detail)

tych0 commented 3 years ago

I don't think case 1 you've provided is legal. Per https://github.com/opencontainers/image-spec/blob/master/manifest.md#image-manifest-property-descriptions we have,

"The final filesystem layout MUST match the result of applying the layers to an empty directory."

So I think the specification already states that it must be case 2.

giuseppe commented 3 years ago

yes, I think it should be case 2, an image made of two different layers. It would be very confusing to support case 1 this way.

thaJeztah commented 3 years ago

The final filesystem layout MUST match the result of applying the layers to an empty directory

"Applying the layers" is very ambiguous combined with the other requirements (more below:)

yes, I think it should be case 2, an image made of two different layers. It would be very confusing to support case 1 this way

Which means that there's no way to have images that are compatible with both any of the existing runtimes and runtimes that support zstd.

As

Implementations MUST support at least the following media types: ...An encountered mediaType that is unknown to the implementation MUST be ignored.

Which means that any of the current runtimes MUST ignore the zstd layer, and then applying the remaining layers.

tych0 commented 3 years ago

Which means that there's no way to have images that are compatible with both any of the existing runtimes and runtimes that support zstd.

I don't think that's what it means at all. It means it won't work this specific way, but I can imagine other ways in which it would.

Which means that any of the current runtimes MUST ignore the zstd layer, and then applying the remaining layers.

That's why I think your proposed clarification is useful: runtimes who can't "process" the layer should error out when asked to. In particular, that's exactly what will happen in current implementations: they will try to gunzip the zstd blob, realize they can't, and fail.

thaJeztah commented 3 years ago

but I can imagine other ways in which it would.

Can you elaborate on what other ways?

tych0 commented 3 years ago

Can you elaborate on what other ways?

Sure, but I don't think it's relevant for whether or not zstd support should be in the spec. With your proposed clarification, I think the spec would be very clear about the expected behavior when runtimes encounter blobs they don't understand (and for tools like e.g. skopeo, who can shuttle these blobs around without understanding them, which is my main concern).

We are already using non-standard mime types in layers at my organization, and because the tooling support for this is not very good, right now we just disambiguate by using a "foo-squashfs" tag for images that are squashfs-based, and a "foo" tag for the tar-based ones.

However, since tag names are really just annotations, you could imagine having an additional annotation, maybe "org.opencontainers.ref.layer_type" to go along "org.opencontainers.ref.name" that people use as tags, that would just be the layer type. Then, in a tool like skopeo, you would do something like skopeo copy oci:/source:foo oci:/dest:foo --additiona-filter=org.opencontainers.ref.layer_type=zstd (or maybe skopeo would introduce a shorthand for this). Tools could then ignore layer types their users aren't interested or they don't know how to support. If there's no manifest with the tag matching the filters that a client knows how to consume, it would fail.

To make this backwards compatible, I suspect always listing the tar-based manifest as the first one in the image would mostly work, assuming tools don't check for multiple images with the same tag and fail. But maybe it wouldn't, I haven't really played around with it. In any case, just using tags to disambiguate works totally fine, even though it's ugly and better tooling support would be appreciated.

SteveLasker commented 3 years ago

Adding new compression formats to a specific type is goodness to bring that artifact forward with new capabilities. Providing consistent behavior across an ecosystem of successful deployment of multiple versions seems the problem. Isn't this effectively what versioning provides? While a newer client might know how to process old and new compression formats, how do we get to the point where we have some stability.? This seems like a pivot for getting a different result, based on what capabilities the client supports. If the client supports version 1 and 2, it should default to version 2. If the client only supports version 1, it knows to pull version 1. If the registry only has version 2, there's a failure state.

This is very akin to the multi-arch approach. The client asks the registry for hello-world:latest and also states it's ARM. The registry says, umm, I don't have an arm version of hello-world:latest, so it fails.

I'm not saying we should actually use multi-arch manifests, but the concept is what we seem to need here.

For reference, we debated this with Teleport. We didn't want to change the user model, or require image owners to publish a new format. When someone pushes content to a teleport enabled registry, we automatically convert it. When the client makes a request, it sends header information that says it supports teleport. The registry can then hand back teleport references to blobs.

So, there are two models to consider here:

The end to end container format has a new compression format, and it appears to be a version change.
The compression format can be handled on the server.

This is also similar to what registries do with docker and OCI manifests. They get converted on the fly. I recognize converting a small json file is far quicker than multi-gb blobs.

Ultimately, it seems like we need to incorporate the full end to end experience and be careful to not destabilize the e2e container ecosystem while we provide new enhancements and optimizations.

thaJeztah commented 3 years ago

and for tools like e.g. skopeo, who can shuttle these blobs around without understanding them, which is my main concern

(IIUC) tools like skopeo should not be really affected for your specific use-case as they for that use-case are not handling the actual image, and are mainly used as a tool to do a full download of whatever artifacts/blobs are referenced (also see my earlier comments https://github.com/opencontainers/image-spec/issues/803#issuecomment-616819168 and https://github.com/opencontainers/image-spec/issues/803#issuecomment-617073165)

However, since tag names are really just annotations, you could imagine having an additional annotation, maybe "org.opencontainers.ref.layer_type" to go along "org.opencontainers.ref.name" that people use as tags, that would just be the layer type. Then, in a tool like skopeo, you would do something like skopeo copy oci:/source:foo oci:/dest:foo --additiona-filter=org.opencontainers.ref.layer_type=zstd

I feel like this is now replicating what manifest-lists were for (a list of alternatives to pick from); manifest lists currently allow differentiating on architecture, and don't have a dimension for "compression type". Adding that would be an option, but (for distribution/registry) may mean an extra roundtrip (image/tag -> os/architecture variant -> layer-compression variant), or add a new dimension besides "platform".

Which looks to be what @SteveLasker is describing as well;

I'm not saying we should actually use multi-arch manifests, but the concept is what we seem to need here.

Regarding;

This is also similar to what registries do with docker and OCI manifests. They get converted on the fly. I recognize converting a small json file is far quicker than multi-gb blobs.

Docker manifests are OCI manifests; I think the only conversion currently still present is for old (Schema 2 v1) manifest (related discussion on that in https://github.com/opencontainers/distribution-spec/issues/212), and is being discussed to deprecate / disable (https://github.com/docker/roadmap/issues/173)

I'd be hesitant to start extracting and re-compressing artifacts. This would break the contract of content addressability, or more specific: what guarantee do I have that the re-compressed artifact has the same content as the artifact that was pushed?. If we want to separate compression from artifacts, then https://github.com/opencontainers/image-spec/issues/803#issuecomment-741844624 is probably a better alternative;

All of this would've been easier if digests were calculated over the non-compressed artifacts (and compression being part of the transport)

justincormack commented 3 years ago

@SteveLasker unfortunately recompression is too CPU intensive and slow to make it workthwhile doing in-registry conversion for most purposes (we looked into this a while back, the CPU costs more than the bandwidth saving).

tych0 commented 3 years ago

IIUC) tools like skopeo should not be really affected for your specific use-case

You'd think that, but it has broken before: https://github.com/containers/image/pull/801 Hence my concern about similar issues in this thread :)

I feel like this is now replicating what manifest-lists were for

Yes, possibly. I haven't thought about it very hard.

jonjohnsonjr commented 3 years ago

Going to brain dump some ideas from the OCI call before they're lost to time...

As per this comment we could add a new dimension to manifest lists. Maybe as a new field, but we already have "annotations", which we could [ab]use for this.

My first thought would be to rely on the fact that most clients take the first compatible option when resolving a manifest list. I believe (https://github.com/opencontainers/image-spec/issues/581 and other issues) that the exact semantics for resolution here were discussed to death and we never standardized on anything (just up to the implementer).

1. Abuse ordering

If we did rely on ordering (which feels gross), something like this (strings obviously changed) could work:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": 1152,
      "digest": "sha256:c95b7b93ccd48c3bfd97f8cac6d5ca8053ced584c9e8e6431861ca30b0d73114",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 1072,
      "digest": "sha256:9e2bcf20f78c9ca1a5968a9228d73d85f27846904ddd9f6c10ef2263e13cec4f",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "annotations": {
        "zstd": "true"
      }
    }
  ],
  "annotations": {
    "zstd": "true"
   }
}

Top-level "annotations" here can indicate to new clients that they should look for zstd-compatible images instead of just picking the first thing they come across. You could define the annotation such that for each gzipped image, there must be an equivalent zstd-compressed image.

The per-manifest descriptor annotation would indicate which one is zstd. Older clients would just pick the first one, which would have gzipped layers, but it's probably not a great idea to rely on that?

2. Alternative image in annotation

Someone (@cpuguy83 I think?) mentioned stuffing alternatives in annotations. I can imagine two approaches here that would be backward compatible:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": 1152,
      "digest": "sha256:c95b7b93ccd48c3bfd97f8cac6d5ca8053ced584c9e8e6431861ca30b0d73114",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "annotations": {
         "zstd-compressed-alternative": "{\"mediaType\":\"application/vnd.oci.image.manifest.v1+json\",\"size\":1072,\"digest\":\"sha256:9e2bcf20f78c9ca1a5968a9228d73d85f27846904ddd9f6c10ef2263e13cec4f\",\"platform\":{\"architecture\":\"amd64\",\"os\":\"linux\"}}"
      }
    }
  ]
}

Here we'd just escape the second descriptor from above and plop it in an annotation. When doing platform resolution, a client could check for this annotation and use it as an alternative if they support zstd compression.

One major drawback: clients that handle artifacts generically (e.g. to copy between registries) would not know about these descriptors, because they're not in manifests. You could hack around that by appending these to the end of the manifest list with garbage platform values that will never be true, but that also seems kind of gross?

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": 1152,
      "digest": "sha256:c95b7b93ccd48c3bfd97f8cac6d5ca8053ced584c9e8e6431861ca30b0d73114",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "annotations": {
         "zstd-compressed-alternative": "{\"mediaType\":\"application/vnd.oci.image.manifest.v1+json\",\"size\":1072,\"digest\":\"sha256:9e2bcf20f78c9ca1a5968a9228d73d85f27846904ddd9f6c10ef2263e13cec4f\",\"platform\":{\"architecture\":\"amd64\",\"os\":\"linux\"}}"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 1072,
      "digest": "sha256:9e2bcf20f78c9ca1a5968a9228d73d85f27846904ddd9f6c10ef2263e13cec4f",
      "platform": {
        "architecture": "zstd",
        "os": "zstd"
      }
    }
  ]
}

3. Alternative layer in annotation

Similar to above, but from within an image. To use the example from this comment:

{
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 12345,
      "digest": "sha256:deadbeef",
      "annotations": {
         "zstd-compressed-alternative":"{\"mediaType\":\"application/vnd.oci.image.layer.v1.tar+zstd\",\"size\":34567,\"digest\":\"sha256:badcafe\"}"
      }
    }
  ]
}

Here we'd escape the zstd descriptor and stuff it into the equivalent gzip descriptor annotations.

Again, similar drawbacks to the second approach around generic artifact handling, but resolves ambiguity around mixed-compression layers vs alternative compression layers.

dmcgowan commented 3 years ago

If we did rely on ordering (which feels gross), something like this (strings obviously changed) could work:

Why does relying on ordering need to feel gross? Using the order to deterministically resolve a manifest from a list should be a part of the specification. If there are multiple manifests which are usable and no explicit preference of one over the other, then the first should be used.

liubogithub commented 3 years ago

I believe that it's inevitable to have some hacks to keep backward compatibility, unless all the existing client tools agree to update all of a sudden.

To me, a new schema version seems to be a good move, but before it gets landed, making use of order dependency and annotation could make things work.

fuweid commented 3 years ago

Based on @jonjohnsonjr comments, we also can use platform.features to include zstd.

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": 1152,
      "digest": "sha256:c95b7b93ccd48c3bfd97f8cac6d5ca8053ced584c9e8e6431861ca30b0d73114",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 1072,
      "digest": "sha256:9e2bcf20f78c9ca1a5968a9228d73d85f27846904ddd9f6c10ef2263e13cec4f",
      "platform": {
        "architecture": "amd64",
        "os": "linux",
        "feature": ["zstd"]
      }
    }
  ]
}

We are seeking methods to prevent old client from reading zstd data. However, it also brings history debt to the new client which can support zstd layer. :)

And I think we can define how to handle unknown media-type for client.

e.g.implementations should / must produce an error if they're not able to "handle" / "process" a layer-type.

It doesn't block to seeking backward compatibility solution and can help manifest/image-spec to accept useful media-type.

jonjohnsonjr commented 3 years ago

Why does relying on ordering need to feel gross?

Mostly comments in the zoom call about not liking it :P

Using the order to deterministically resolve a manifest from a list should be a part of the specification. If there are multiple manifests which are usable and no explicit preference of one over the other, then the first should be used.

Agreed, but given that this isn't currently part of the spec, relying on this would be backward-incompatible. I guess it depends on what our standard for making theoretically breaking changes is. Within the go community, it seems that if you can produce any program that would be broken by a language change as a counterexample, the change is automatically rejected. I can imagine a client implementation that arbitrarily takes the last manifest instead of the first, which I believe would conform to the spec today, so relying on any default behavior would be breaking, no?

And don't get me wrong, the reason I proposed this is because I like it :) I'm just not sure what kind of trade-offs are acceptable.

A lot of folks were advocating for just revving to schema 3, but I think that actually just introduces an additional problem without solving anything.

justincormack commented 3 years ago

annotations referring to layers is terrible for things like garbage collection in registries - how do we know which annotations might have references.

Using feature looks ok to me.

dmcgowan commented 3 years ago

I can imagine a client implementation that arbitrarily takes the last manifest instead of the first, which I believe would conform to the spec today, so relying on any default behavior would be breaking, no?

I think we can handle this with language. When adding these clarifications, we use SHOULD language rather than MUST language so as not to accidentally make an existing compliant client suddenly incompatible. Our goal here is to minimize impact but having 1.0 clients guaranteed compatible with 1.1 is not backwards compatible, that is the always difficult forward compatibility. With clarifying any ambiguity, there is always a counter example but that can also cause analysis paralysis. Clarifying how to resolve a manifest I think is beneficial either way in any case and avoids these issues going forward (it has already come up in regards to OCIv2 image spec).

jonjohnsonjr commented 3 years ago

I'd agree that features is nice, especially given:

This property is RESERVED for future versions of the specification.

For ease of client implementation, I would still propose using a top-level annotation to signal to clients that they should be looking for certain features, otherwise the resolution logic always requires looking at every descriptor. Maybe this is desirable, but we should standardize on the "correct" resolution logic to enable future growth.

With clarifying any ambiguity, there is always a counter example but that can also cause analysis paralysis.

Definitely agree, but would like to know what our standard is here. Rough consensus that it's probably fine, otherwise TOB vote? We are fortunate to be able to pull numbers from most registry logs, so for some things we may be able to all agree that "nobody uses this anymore", but in cases like this with client behavior, it's a bit harder to get numbers, so we're left with a judgement call. (I'm thinking about this both in context of this issue, but also schema 1 + content negotiation stuff.)

Clarifying how to resolve a manifest I think is beneficial either way in any case and avoids these issues going forward

Yes, please! cc @cyphar

justincormack commented 3 years ago

We can change the way to process the manifest potentially, eg you must follow hint, or must read all items for example as part of the upgrade, while still allowing older behaviours. If feature is reserved that should help as it therefore should be ignored (but I need to go through the wording in detail), but actual behaviour matters too, e choose first match.

cpuguy83 commented 3 years ago

I am not a fan of handling this by order:

Merging manifest lists is already a burden, this adds a whole new dimension to that.
What should the order be? What about when there's more than a zstd variant? In this case the image author ends up making decisions for the image consumer which may not be ideal.

That said, I'm all for a working solution. There won't be a perfect one.

cpuguy83 commented 3 years ago

I suggest adding a new, optional field to the descriptor in the manifest list which takes a list of digests referring too alternative versions of the manifest specified in the descriptor where the assumption is:

The digest refers to the same media type
The digest refers to the same platform
It should in effect be the same image but with different encoding (compression or whatever else might change under the hood).

This would in effect be something like a map[string]digest.Digest, where the string is something that the client needs to understand in order to use it.

Some issues I see:

GC (As @justincormack mentioned)
Compound features (like, maybe it's zstd compressed but not something else is also special about it). This is something that os.features handles well.
New field on descriptor

jonjohnsonjr commented 3 years ago

map[string]digest.Digest

I think I'd prefer []v1.Descriptor, maybe something like an "alternates" field or "descriptors" (which would allow nested descriptors as well). You could put your proposed string key as just annotations or features on these sub-descriptors, or just as an annotation on the original descriptor. Making this a new field fixes some of the GC problems, but adding a new field means updating a lot of clients.

It isn't really that different from relying on ordering, but it does allow explicitly associating other artifacts with a descriptor, which might be useful for notary stuff as well.

This is kind of analogous to how the "urls" field works for blobs.

Concretely, we could add a field to the end of v1.Descriptor:

// Descriptor describes the disposition of targeted content.
// This structure provides `application/vnd.oci.descriptor.v1+json` mediatype
// when marshalled to JSON.
type Descriptor struct {
    // MediaType is the media type of the object this schema refers to.
    MediaType string `json:"mediaType,omitempty"`

    // Digest is the digest of the targeted content.
    Digest digest.Digest `json:"digest"`

    // Size specifies the size in bytes of the blob.
    Size int64 `json:"size"`

    // URLs specifies a list of URLs from which this object MAY be downloaded
    URLs []string `json:"urls,omitempty"`

    // Annotations contains arbitrary metadata relating to the targeted content.
    Annotations map[string]string `json:"annotations,omitempty"`

    // Platform describes the platform which the image in the manifest runs on.
    //
    // This should only be used when referring to a manifest.
    Platform *Platform `json:"platform,omitempty"`

    // Descriptors specifies a list of additional descriptors associated with this object.
        //
        // Clients should use "annotations" or "features" to determine their purpose.
    Descriptors []Descriptors `json:"descriptors,omitempty"`
}

And maybe the artifact looks something like:

{
    "schemaVersion": 2,
    "manifests": [
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 1152,
            "digest": "sha256:c95b7b93ccd48c3bfd97f8cac6d5ca8053ced584c9e8e6431861ca30b0d73114",
            "platform": {
                "architecture": "amd64",
                "os": "linux"
            },
            "annotations": {
                "zstd-alternative": "sha256:9e2bcf20f78c9ca1a5968a9228d73d85f27846904ddd9f6c10ef2263e13cec4f",
                "notary-thing": "sha256:134c7fe821b9d359490cd009ce7ca322453f4f2d018623f849e580a89a685e5d"
            },
            "descriptors": [
                {
                    "mediaType": "application/vnd.oci.image.manifest.v1+json",
                    "size": 1072,
                    "digest": "sha256:9e2bcf20f78c9ca1a5968a9228d73d85f27846904ddd9f6c10ef2263e13cec4f",
                    "platform": {
                        "architecture": "amd64",
                        "os": "linux",
                        "features": [
                            "zstd"
                        ]
                    }
                },
                {
                    "digest": "sha256:134c7fe821b9d359490cd009ce7ca322453f4f2d018623f849e580a89a685e5d",
                    "mediaType": "something-something-x509-signature-mediaType",
                    "size": 1337,
                    "annotations": {
                        "org.opencontainers.image.signature.fingerprint": "43:51:43:a1:b5:fc:8b:b7:0a:3a:a9:b1:0f:66:73:a8"
                    }
                }
            ]
        }
    ]
}

This is similar to @vbatts suggestion for a signatures field on v1.Descriptor, but even more generic.

We could just make this recursive v1.Descriptor as the "generic" artifact we've talked about? I believe it could represent essentially anything. @SteveLasker

Old clients won't bother parsing the descriptors field. For new clients, we can decide on any behavior we want. Registries and clients will need to be updated to consider these descriptors for GC and generic copying stuff, though.

SteveLasker commented 3 years ago

@justincormack

@SteveLasker unfortunately recompression is too CPU intensive and slow to make it workthwhile doing in-registry conversion for most purposes (we looked into this a while back, the CPU costs more than the bandwidth saving).

It does depend on the scenario. For compatibility of clients, this is an approach, and just putting it out there for consideration. For Docker Hub, or other public registries that must manage/absorb costs for the masses, I completely agree this isn't feasible.

Ordering/Abuse/Gross

I'd agree this feels like a hole many customers/users would fall into without realizing it., based on workflow tooling outside of their control.

Use of annotations

While technically feasible, doesn't this still require the client to know to parse these annotations? And, clients that don't know about the format to know how to ignore these entries?

Registry as Cloud Storage

One major drawback: clients that handle artifacts generically (e.g. to copy between registries) would not know about these descriptors

This is one of my biggest concerns. As we continue to move from the assumption there's one registry where everything lives, to a hybrid environment where content moves between registries, we need more interchangeability. And, we need generic APIs to support content replication and copying within and across registries. This would mean any solution would need to be generic so garbage collection and registry APIs could work generically with any artifact type.

If we look at how files are stored on file systems, we don't move content with the Office API. We move content with the file system APIs. We use copy commands, and the file descriptors know how to define the contents of the file. What I'm hoping we'll support, is a means to copy content from one repository or registry to another repository or registry. If we use general schema definitions, we can meet this goal. A registry just needs to define what makes up a thing. Both hard and soft links.

I believe that it's inevitable to have some hacks to keep backward compatibility, unless all the existing client tools agree to update all of a sudden.

As a group, we are the collective maintainers. Sorry, but why are we discussing working around a constraint, when we own the constraints (spec), and in many cases, own/influence the tools? I struggle with us working within the limitations.

To me, a new schema version seems to be a good move, but before it gets landed, making use of order dependency and annotation could make things work.

+2 on the schema (can I have 2 if I really really want it?). And, it's good to experiment to validate. But, I don't think we should push experimental content to public registries. Experimental features on tooling are different, as long as the content is well defined. And, if we're experimenting, why can't we experiment with the client tools that would know how to parse the new versions/schemas? How many of these tools aren't public that would allow an experimental fork to things like containerd or even moby?

A lot of folks were advocating for just revving to schema 3, but I think that actually just introduces an additional problem without solving anything.

By reving the schema, and holding compatibility to a schema, isn't that what versioning is meant to support?

annotations referring to layers is terrible for things like garbage collection in registries - how do we know which annotations might have references.

IMO, using annotations for anything other than search and interesting meta-data is dangerous to surface incompatibility. For garbage collection, registry interchangeability, we really need very deterministic schemas to work with.

We could just make this recursive v1.Descriptor as the "generic" artifact we've talked about? I believe it could represent essentially anything. @SteveLasker

We've been iterating on a new generic manifest type for artifacts. Something that would allow us to decouple from image spec, but the schema is generic enough to support any new artifact type, including a new image schema. I haven't finished writing it up, but it essentially supports hard and soft links. It would also allow reverse lookups for things like signatures and meta-data.

I don't know if this example is meaningful enough without the other examples and some description:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
  "artifactType": "application/vnd.cncf.notary.v2",
  "config": {
    "mediaType": "application/vnd.cncf.notary.config.v2",
    "size": 0,
    "digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7"
  },
  "parent": {
    "mediaType": "application/vnd.oci.image.index.v1.config.json",
    "reference": "/wordpress:5.7"
  },
  "blobs": [
    {
      "mediaType": "application/vnd.cncf.notary.v2.json",
      "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0",
      "size": 32654,
      "reference": "registry.wabbitnetworks.io"
    }
  ],
  "dependencies": [  ]
}

The main point I'm hoping we'll all consider, is we do not have to fit in an existing schema. We can, and I'd argue, SHOULD create a new schema, or at least a version to define this new type.

I hear two main arguments for/against this.

What about hello-world:latest? Should we support :latest with different compression formats?
Why not use unique tags to indicate the unique compression format? hello-world:latest-zstd

I'm not a fan of the additional tag, as it's hard to know when we'd phase it out, for the new-new format.

So, it seems we have two options:

figure out how to use multi-arch manifests (index lists) to identify which manifest/compression format to use
support a single manifest with two compression formats?

I really dislike the multi-compression layer manifest. This would mean pushing the same artifact with different compression formats. As opposed to optionally pushing two compression formats, depending on the need.

jonjohnsonjr commented 3 years ago

As a group, we are the collective maintainers. Sorry, but why are we discussing working around a constraint, when we own the constraints (spec), and in many cases, own/influence the tools? I struggle with us working within the limitations.

I strongly disagree with this. There are a lot of actors involved here. If we don't do something in a backward-compatible way, we need to update/release registries and clients, then wait for adoption of those releases for several years. Sure, we can probably cover 90% of users in a pretty short time frame with the folks who represent this group, but that's not really acceptable to me. If we can roughly prove that 99.99% of users are unaffected, that's a different story (e.g. the schema 1 discussion).

By reving the schema, and holding compatibility to a schema, isn't that what versioning is meant to support?

Yes but that doesn't mean it's painless. Schema 2 work started in 2015, and we're just now talking about turning off schema 1 at the end of 2020.

We've been iterating on a new generic manifest type for artifacts.

Do you have a link to this?

I don't know if this example is meaningful enough without the other examples and some description...

The intermingling of mutable things and content-addressable things makes my initial reaction to this pretty negative, but I'd like to see a full proposal.

I'd much rather adapt what we have in a backward compatible way, if at all possible.

tych0 commented 3 years ago

By reving the schema, and holding compatibility to a schema, isn't that what versioning is meant to support?

The problem is that we want to have a manifest that can be parsed and run by existing code in the wild, but have new code be able to parse it and run it with zstd, which costs less money/cpu/kittens. If we do schema++, all the existing code will just break and won't be able to use this new version (assuming it checks the schema version at all; if it doesn't things are even worse).

Plus I don't think we even have to do schema++ it if we're ok with breaking existing clients: we can already break these clients by just shipping zstd layers and telling people "don't point your clients at zstd layers if they're old"; old clients will fail to decompress them anyway because the client doesn't understand zstd.

But it's not clear to me that the image spec is the right place to solve this problem: the image spec should describe how an image with zstd compression is stored, which it currently does. Shouldn't the client negotiate via the distribution spec about what features etc. it supports?

jonjohnsonjr commented 3 years ago

But it's not clear to me that the image spec is the right place to solve this problem: the image spec should describe how an image with zstd compression is stored, which it currently does. Shouldn't the client negotiate via the distribution spec about what features etc. it supports?

The image spec seems to be where we'd discuss any changes to the format itself. If we do some http-based content negotiation, I'd agree that distribution spec makes sense for that discussion, but that was one part of the schema1 -> schema2 migration that was particularly confusing and frustrating, so I'm not sure if we'd repeat that mistake.

cpuguy83 commented 3 years ago

re: schema bump

Let's not forget, the entire point of this discussion is backwards compatibility. We can already create an image with zstd, push it to a registry, and pull it on a client that supports zstd decompression.

The problem space here is making sure in the general case we don't have a world where an image works on one client and doesn't on another. We want to be able to support zstd as well as other alternative formats and do so in a way that older clients continue to work (and ignore the alternative format).

Schema bump should be a worst case scenario.

rhatdan commented 3 years ago

We seem to be in a chicken and egg situation, other then putting the same content twice into an image gzipped and zstd. Or having the container registry create gzipped content on the fly. We can not handle the new content on older clients.

If we add support to newer clients, then packagers of the newer images will have to document and take on the risk that their images will not be supportable with older clients. If we continue to build/push using the older format, or create images with both formats (I don't even know if this is possible).

Then at some point in the future, we could switch to pushing zstd as the default, once we felt there was sufficient new clients out there.

thaJeztah commented 3 years ago

Just a quick blurb (need to re-read some comments more in-depth)

I am not a fan of handling this by order:

While I do agree it's somewhat ugly, IIUC it will be backward compatible with current implementations if the spec describes that for equally matching options, the first one should be used.

If additional metadata is available, newer clients can consider that information, and prefer "same image, but optimized compression" over "less-optimised compression". (So for newer clients, the options would not be "equal").

If we continue to build/push using the older format, or create images with both formats (I don't even know if this is possible).

I would still consider "mixed" compression a possible situation, and I'd love to see a better definition of that part (how it should be handled); current description of "ignore what's not recognised" is too ambiguous (imo)

SteveLasker commented 3 years ago

I'll fully acknowledge I'm not investing my full thought, as I'm trying to take some personal time. I just didn't want to ignore this great conversation with what seems like conflicting goals.

We want to support backward compatibility, yet there are clients that don't know about versioning, and we didn't design this intention in the first place
As big as containers are now, we're still in the beginning of this continued and major shift. If we don't clean this up now, are we really setting ourselves up for success when this does get big?
The longer we take to ship a versioning scheme that provides downlevel clients a means to pull their compatible content, with flexibility in forward options, the exponentially longer it will take for more clients to update.
Shipping some mix-mash of things feels like a demise of a standard.

I get that there are lots of tools out there.

How many of them would actually break?
How many of them are in critical workloads that are pulling content that hasn't been tested?
How many could simply upgrade?

...We can already create an image with zstd, push it to a registry, and pull it on a client that supports zstd decompression.

I think we're mostly stuck on how a single tag on an artifact can reflect multiple things. Does anyone disagree that a unique tag, with a revised mediaType version, could support anything new?

mixed compression

What are the practical use cases where a newly pushed manifest would need mixed compression format?

thaJeztah commented 3 years ago

On 15 Dec 2020, at 20:12, Steve Lasker notifications@github.com wrote:

What are the practical use cases where a newly pushed manifest would need mixed compression format?

This would be in a situation where a new client builds an image that extends an image that doesn't have zstd compression (I mentioned this in one of my earlier comments above)

opencontainers / image-spec