Open thaJeztah opened 4 years ago
I get the inheritance, FROM model. But, why wouldn't it just push the newly compressed layers (both new layers and original layers that can now be compressed), to up-level the entire thing? What benefit do you get by having a mixed mode?
What benefit do you get by having a mixed mode?
This allows for very cheap composition of images that happen to use different types of compression, e.g. if you want to rebase a gzip-compressed image onto a new zstd-compressed base image, which is a major selling point of buildpacks (cc @sclevine is there a better doc than this?), you can achieve this with some minor editing of manifests. If you required each layer to have the same compression scheme, you'd have to recompress the old layers or the new base to match, which would be expensive and defeat the goals of cheap rebasing.
I think we're mostly stuck on how a single tag on an artifact can reflect multiple things. Does anyone disagree that a unique tag, with a revised mediaType version, could support anything new?
I don't think we're stuck here, because this is exactly what an image index does. I'd be interested in how a new mediaType would be functionally different from an image index other than breaking all existing clients.
The buildpack angle is definitely interesting, and something we're hoping can solve the quick-servicing events. But, do we see zstd as the new standard? Or, is there a benefit of keeping gzip around as well? If we're trying to make a move to zstd for better performance, then shouldn't we help the entire ecosystem move forward? Once re-compressed with zstd, wouldn't you get even better performance.
Stuck on single tags: This goes back to what goals we're trying to solve. If I want to add zstd format:
- must I update the existing gzip manifest and have a single tag?
- must I create a new manifest that supports zstd, and use the index to split based on what the user wants? In either case, if we want a single tag to support both gzip and zstd,
hello-world:latest
would need to either point to an index (with a single tag) that splits to individual manifests. Or, a single manifest, that supports both compression formats.
if we want a single tag to support both gzip and zstd, hello-world:latest would need to either point to an index (with a single tag) that splits to individual manifests.
The hello-world image is already a manifest list (multi-arch); adding alternative compressions would mean having multiple variants for the same os/arch (existing clients would pick the first match; newer clients can pick more specific, based on extra info/annotations describing the compression)
(at least, I think that's roughly what's being proposed/discussed above)
Back to the mediaType, I am suggesting zstd would use a different mediaType. Certainly for the layers, but'd also suggest this is where we could do a version bump on the manifest as well. This way we know if the index supports a manifest that supports zstd. Just as an index can identify if a specific platform/arch combo exists.
The hello-world image is already a manifest list (multi-arch); adding alternative compressions would mean having multiple variants for the same os/arch (existing clients would pick the first match; newer clients can pick more specific, based on extra info/annotations describing the compression)
(at least, I think that's roughly what's being proposed/discussed above)
Just as an index can identify if a specific platform/arch combo exists.
This is exactly what I'm proposing, yes. One of the approaches (adding a field to Descriptor) would even extend down to the image manifest / layer level, if we go that way. See below.
something we're hoping can solve the quick-servicing events
What are quick-servicing events and how do they solve rebasing?
Or, is there a benefit of keeping gzip around as well? If we're trying to make a move to zstd for better performance, then shouldn't we help the entire ecosystem move forward? Once re-compressed with zstd, wouldn't you get even better performance.
Yes, the benefit is not breaking every single existing client in use today. I'm fine with moving towards zstd, but there is a delicate sequencing of events that need to happen, else this will be a complete disaster, e.g. python 3.
Roughly:
Back to the mediaType, I am suggesting zstd would use a different mediaType.
Of course, this is already merged. They should be application/vnd.oci.image.layer.v1.tar+zstd
. I poked the facebook folks to register the +zstd suffix a while ago, so we're not blocked on that.
Certainly for the layers, but'd also suggest this is where we could do a version bump on the manifest as well.
I'd be okay with a backward-compatible version bump (e.g. what I'm proposing below). I'm leaning on the kubernetes versioning guidelines a bit here, but we should be able to add fields without completely breaking everything, as long as old clients ignore fields they don't expect (this might not be safe to do -- for example, GCR will probably reject what I'm proposing, not sure if any clients would break from this).
Do we need to do a version bump for strictly additive changes?
This way we know if the index supports a manifest that supports zstd.
I don't know if it makes sense to distinguish a manifest supporting zstd vs gzip if we're going to allow mixed compression (I think we should). Looking at my example you can imagine that instead of doing this at the manifest list level, we could have alternate descriptors on an image manifest at the layer level. This is if you want to support zstd but also be backward-compatible with gzip.
Concrete example, using debian@sha256:5b5fa7e155b1f19dffb996ea64e55520b80d5bd7a8fdb5aed1acabd217e9ed59
:
Before:
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 1463,
"digest": "sha256:6d6b00c22231693c9b87e79986d562874446bf10182206e4621e23ca8dfa8e1c"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 50397728,
"digest": "sha256:6c33745f49b41daad28b7b192c447938452ea4de9fe8c7cc3edf1433b1366946"
}
]
}
After:
{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"size": 1463,
"digest": "sha256:6d6b00c22231693c9b87e79986d562874446bf10182206e4621e23ca8dfa8e1c"
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"size": 50397728,
"digest": "sha256:6c33745f49b41daad28b7b192c447938452ea4de9fe8c7cc3edf1433b1366946",
"descriptors": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
"size": 41414997,
"digest": "sha256:157a9fb3cd509e4de73599dcdd51a32b5f8e9cfa356516560dd08f298a4db072"
}
],
"annotations": {
"zstd-alternative": "sha256:157a9fb3cd509e4de73599dcdd51a32b5f8e9cfa356516560dd08f298a4db072"
}
}
]
}
(It might make sense to have platform.features on layer descriptors, or we can just rely on annotations. They're functionally equivalent to me.)
Of course, maybe it does make sense to partition them at the manifest list level and never support mixed compression, in which case my original example would work, I think, but per the rebasing use case, mixed compression is useful.
@SteveLasker
zstd doesn't need to have a different media type. A manifest is a manifest regardless of the content it holds, and it also doesn't solve the problem of older clients.
We are looking for a way to progressively upgrade clients of existing images.
A client that understands how to decode zstd should be able to look at debian:buster
and know how to get the zstd version, and at the same time not break clients who don't know anything about zstd.
Until we reach an agreement on how to support multiple compression variants, could we at least move forward and expect such images to be made explicit (e.g. by a tag)?
Even when multiple compression formats are supported in the same manifest, I don't think that we should enforce that each image is available in multiple formats and there will be zstd-only images that will anyway fail with old clients.
holy wow. This thread got off the rails. Let's recap.
zstd
support should be reverted. The point of enabling new extensions is to have a common place to introduce, and coordinate implementations. (That is exactly what is happening here)sha512
or blake2b
. Unfortunately, even proposing a way to abstract this would also be a breaking change for old clients.I think we should enable folks to try different mediatype objects, even if that only means a different compression. If we choose to disagree about this, it will only make the work-around implementations nastier, and thereby our own OCI Artifacts effort will suffer from it.
but this is the same issue we have with the prospect of digests using...
There is some spec language around this, but I agree we're in a similar spot regardless. A lot of this is hardcoded in older clients, and I really doubt that most modern clients would work as expected even if they purport to support other digest algorithms.
I think we should enable folks to try different mediatype objects, even if that only means a different compression. If we choose to disagree about this, it will only make the work-around implementations nastier, and thereby our own OCI Artifacts effort will suffer from it.
I agree with this, but I think what was merged just hand-waved the problem away in https://github.com/opencontainers/image-spec/pull/759 without actually addressing the backward-compatibility concern.
If we want to somehow make an image that works for both gzip-assuming and zstd-supporting clients, just having both kinds of layers and the "ignore stuff you don't understand" clause is insufficient, and we don't currently have a strategy that does work. I don't know if this issue is the right place to figure that out, but I agree with @giuseppe here:
I don't think that we should enforce that each image is available in multiple formats and there will be zstd-only images that will anyway fail with old clients.
The "ignore stuff" PR is probably not fine for this either? Even if old clients just ignored all the zstd layers, there will be no layers, and the resulting filesystem will be empty. Should clients just fail outright if they don't understand zstd? I think it's fine to push zstd-only images, especially if you control the clients that will be pulling them, but I don't think the spec language is firm enough to write a client that behaves reasonably, here, in basically any situation...
The "ignore stuff" PR is probably not fine for this either? Even if old clients just ignored all the zstd layers, there will be no layers, and the resulting filesystem will be empty.
Are you talking about #816? I tried to make it clarify exactly this point that runtimes cannot ignore layers they don't understand, and must fail:
Container runtimes cannot satisfy the bit about:
The final filesystem layout MUST match the result of applying the layers to an empty directory.
by ignoring layer types, so the spec is slightly in conflict with itself here, by requiring runtimes both MUST apply all the layers and MUST ignore layers that they cannot understand.
Are you talking about #816?
Ah, I missed this one, thanks.
Specifically this bit:
However, image transfer tools can safely ignore mime types they don't understand (indeed, skopeo does this today);
Assuming this is for skopeo
copying things, I don't think it is safe to ignore mime types we don't understand? E.g., if I want to actually copy an image to a different registry, I need to copy all the referenced blobs, or else the image is incomplete. I don't need to try to parse them, but I just can't just ignore them... right?
One approach would be to assume any unrecognized media types are blobs, and copy them from /v2/.../blobs/
, which should work most of the time. However, one place this strategy wouldn't work is if we specify completely new manifest media types (e.g. the artifacts stuff). Clients do actually need to know how to parse those if they're going to do anything reasonable, and assuming they're just blobs will of course fail. If that fails, do we just ignore the failure? That seems bad. (Here lies a lot of my hesitation around defining a new artifact type.)
Probably makes some sense to discuss this more on your PR, and this gets into distribution stuff as well...
I just can't just ignore them... right?
Right, so maybe the commit message needs re-wording. The actual spec text and example from the commit matches what you're saying, though:
FYI https://github.com/opencontainers/image-spec/pull/880 may be interesting to folks.
Please add zstd. Pulling images is so slow because of decompression.
/cc @zvonkok
For those that aren't supporting zstd today, what is preventing it? Of the tools that don't support it yet and what is blocking the PR to add support?
For those that require this feature, do we have metrics showing that decompression is a slower step than the network speed to download the blob? Without that, then decompressing during the download (instead of waiting for the download to complete) would provide a performance improvement without zstd.
For those that require this feature, do we have metrics showing that decompression is a slower step than the network speed to download the blob? Without that, then decompressing during the download (instead of waiting for the download to complete) would provide a performance improvement without zstd.
For Podman/CRI-O, one zstd feature we are interested in using is "skippable frames" so we can embed some metadata in the compressed stream.
In addition to the faster decompression, zstd also requires less CPU time.
On my machine, using pigz for comparison as it is much faster than GNU gzip, I get:
$ dd if=/dev/urandom bs=1G count=1 | pigz - > /tmp/blob.gz
$ dd if=/dev/urandom bs=1G count=1 | zstd - > /tmp/blob.zstd
$ \time -v pigz -d < /tmp/blob.gz > /dev/null
Command being timed: "pigz -d"
User time (seconds): 0.27
System time (seconds): 0.44
Percent of CPU this job got: 200%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.35
$ \time -v zstd -d < /tmp/blob.zstd > /dev/null
Command being timed: "zstd -d"
User time (seconds): 0.14
System time (seconds): 0.15
Percent of CPU this job got: 206%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.14
For Podman/CRI-O, one zstd feature we are interested in using is "skippable frames" so we can embed some metadata in the compressed stream.
What is preventing Podman/CRI-O from supporting these features today? Is there a PR blocked because of OCI?
In addition to the faster decompression, zstd also requires less CPU time.
Is this from content that was pulled from a registry, or content stored locally in a compressed state?
For Podman/CRI-O, one zstd feature we are interested in using is "skippable frames" so we can embed some metadata in the compressed stream.
What is preventing Podman/CRI-O from supporting these features today? Is there a PR blocked because of OCI?
nothing really :-) We are planning to use zstd by default on Fedora 41: https://fedoraproject.org/wiki/Changes/zstd:chunked
In addition to the faster decompression, zstd also requires less CPU time.
Is this from content that was pulled from a registry, or content stored locally in a compressed state?
stored locally in a compressed state
We are planning to use zstd by default on Fedora 41: https://fedoraproject.org/wiki/Changes/zstd:chunked
Do we need to differentiate between zstd and zstd+chunked?
We are planning to use zstd by default on Fedora 41: https://fedoraproject.org/wiki/Changes/zstd:chunked
Do we need to differentiate between zstd and zstd+chunked?
No, it is still a valid zstd file. Clients that do not use the additional metadata will simply ignore it.
We are planning to use zstd by default on Fedora 41: https://fedoraproject.org/wiki/Changes/zstd:chunked
Do we need to differentiate between zstd and zstd+chunked?
No, it is still a valid zstd file. Clients that do not use the additional metadata will simply ignore it.
What is the media type of zstd:chunked
? If it is application/vnd.oci.image.layer.v1.tar+zstd:chunked
it should still become part of the spec such that tools will know that they can use it (even if they just treat it as a normal zstd file). Otherwise tools will always need to upload images with both formats even though they are identical.
application/vnd.oci.image.layer.v1.tar+zstd:chunked
it is application/vnd.oci.image.layer.v1.tar+zstd
application/vnd.oci.image.layer.v1.tar+zstd:chunked
it is
application/vnd.oci.image.layer.v1.tar+zstd
How does a client then know if the layer is chunked? Does it always need to fetch the header and somehow determine if it is chunked using magic bytes?
How does a client then know if the layer is chunked? Does it always need to fetch the header and somehow determine if it is chunked using magic bytes?
it could do that, or use the annotations we added for that layer, e.g.:
{
"MIMEType": "application/vnd.oci.image.layer.v1.tar+zstd",
"Digest": "sha256:9efd019b05bc504fcc4d0e244f3c996eb2c739f3274ab5cc746e0f421044c041",
"Size": 113639090,
"Annotations": {
"io.github.containers.zstd-chunked.manifest-checksum": "sha256:f67017010afe34d9a5df4c1f65c6ff7ac7a452b57e7febea91d80ed9da51841e",
"io.github.containers.zstd-chunked.manifest-position": "111910713:1066869:6231165:1"
}
}
so it can immediately fetch the TOC and validate it against the checksum that is recorded in the manifest as well
I just spent a bit of time rereading this thread (it's a long one) and a lot has happened since the original issue.
Given everything that has happened, what is left to complete before resolving this issue?
manifest.md
.layer.md
might be useful for compatibility between the various tools. Perhaps @giuseppe would be up for that. That could also be tracked as a separate issue.Anything that I'm missing? If not, I can work on a PR for the manifest.md
layers text so we can close this out.
(adding the meaningful bit of my thoughts from today's call)
A lot of discussion on supporting multiple compression types has moved towards solving this with transport level compression, which the registry could pre-cache. But everyone is hesitant to go that direction knowing that older clients and registries would fallback to uncompressed layers, a very bad intermediate state.
IMO, the intermediate state of "many older tools don't support the layers at all" (zstd) is worse than "many older tools will have increased storage, but everything will be functional" (uncompressed / transport compression) :eyes:
One of these is a really viable intermediate state to allow actual transition (that happens to also solve other interesting problems like layer digest vs DiffID) while the other is a complete non-starter (at least, from my own perspective as a large publisher of a number of popular images).
(oh, and I'm +1 on adding zstd to the "SHOULD support" list)
2 cents: the real world (containerd, buildkit, docker, buildah, etc) already supports zstd. The fact that image spec lags behind just introduces a discrepancy. From technical POV, zstd is just unlimately superior to gzip in all aspects. In order to continue to make sense, image spec should declare zstd as a supported format.
The image spec was updated to support it, back in https://github.com/opencontainers/image-spec/pull/788, which was part of every RC and the GA of v1.1.0.
This issue is more about saying that implementations "SHOULD" support it, which is mostly semantics because, as you've said, all the major tools already do.
While reviewing https://github.com/moby/moby/pull/40820, I noticed that support for zstd was merged in master (proposal: https://github.com/opencontainers/image-spec/issues/787, implementation in https://github.com/opencontainers/image-spec/pull/788 and https://github.com/opencontainers/image-spec/pull/790), and some runtimes started implementing this;
However, the current (v1.0.1) image-spec does not yet list zstd as a supported compression, which means that not all runtimes may support these images, and the ones that do are relying on a non-finalized specification, which limits interoperability (something that I think this specification was created for in the first place).
I think the current status is not desirable; not only does it limit interoperability (as mentioned), it will also cause complications Golang projects using this specification as a dependency; go modules will default to the latest tagged release, and some distributions (thinking of Debian) are quite strict about the use of unreleased versions. Golang project that want to support zstd would either have to "force" go mod to use a non-released version of the specification, or work around the issue by using a custom implementation (similar to the approach that containerd took: https://github.com/containerd/containerd/pull/3649).
In addition to the above, concerns were raised about the growing list of media-types (https://github.com/opencontainers/image-spec/issues/791), and suggestions were made to make this list more flexible.
The Image Manifest Property Descriptions, currently describes:
Followed by:
This part is a bit ambiguous (perhaps that's just my interpretation of it though);
+zstd
layer mediatype is not in theMUST
list, is there any reason for including it in the list of OCI Media Types? After all, any media types not included in the list "could" be supported by an implementation, and must otherwise be ignored.What's the way forward with this?
v1.1.0
, only defining+zstd
as a possible compression format for layers, but no requirement for implementations of thev1.1.0
specification to support them+zstd
compression format to the list of required media types, and tagv1.1.0
; projects implementingv1.1.0
of the specification MUST support zstd layers, or otherwise implementv1.0.x
v1.1.0
v1.1.0
release (1.
or2.
), and leave3.
for a future (v1.2.0
) release of the specification.On a side-note, I noticed that the vnd.oci.image.manifest.v1+json was registered, but other mediatypes, including media-types for image layers are not; should they be?