opencontainers / image-spec

OCI Image Format
https://www.opencontainers.org/
Apache License 2.0
3.44k stars 633 forks source link

how to share blobs across multiple images in image-layout spec #811

Open deitch opened 3 years ago

deitch commented 3 years ago

The way that image-layout is written, it cannot (or is very hard to; perhaps I should be less definite in the statement) support multiple images sharing blobs.

The "entrypoint" of the directory is index.json, which, essentially, is the root image index of the single image (whatever name and tag that image is, which is not visible from inside the directory).

If, however, I have two images, I need two distinct root directories, each with its own blobs/sha256/ and index.json. This despite the fact that these two images may actually share 9 out of 10 layers. The content of blobs/sha256/ is, by definition, content-addressable, so there is zero conflict in having a shared directory for two or more images; only the index.json would be different.

containerd itself handles this by having a shared blobs/sha256/ dir and apparently ignoring index.json in favour of its boltdb for pointing to the root of the image. I suspect that is partially so that blobs can be shared, and partially because of the need for multiple concurrent reads and writes of the metadata, requiring a real database.

It would be good if the spec supported multiple images with some, all or no shared content, and a single shared content directory. Nothing would force anyone to use it - you still could have multiple directories - but it would create a lot of efficiency options.

I sort of can get around this now on disk making blobs/ a symlink, but it is messy.

I see a few possible options:

Coming back around: is there a current "correct" way to have blobs shared across images in the current image-layout spec, and if not, what can we do to get there?

AkihiroSuda commented 3 years ago

containerd itself handles this by having a shared blobs/sha256/ dir and apparently ignoring index.json in favour of its boltdb for pointing to the root of the image.

The content store directory under /var/lib/containerd has completely nothing to do with OCI. It just happens to have similar directory structure.

OCI image archives with index.json are fully supported with ctr images import

AkihiroSuda commented 3 years ago

Isn’t "org.opencontainers.image.ref.name" annotation already supporting multi-image archives?

deitch commented 3 years ago

Dang, it does, @AkihiroSuda ? I missed that.

So I could put multiple images, as long as each root is in index.json and has the image name? And ctr image import would import all of them?

deitch commented 3 years ago

@AkihiroSuda I just read through the spec again (prompted by your explanation), both for image-layout and annotations. I also read through the various extended discussions on this in the issues.

Here is what I concluded, based on what the above and what you said:

So if I pulled down of docker.io/library/alpine:3.11, the structure would look like this:

index.json
blobs/sha256/04fb172885a8d8f6aacfef173b9adf5641bc14745ee240e6abd44ba830838281
blobs/sha256/0ff8a9dffabb5ed8dcba4ee898f62683305b75b4086f433ee722db99138f4f53
blobs/sha256/19c4e520fa84832d6deab48cd911067e6d8b0a9fa73fc054c7b9031f1d89e4cf
blobs/sha256/2826c1e79865da7e0da0a993a2a38db61c3911e05b5df617439a86d4deac90fb
blobs/sha256/29e5d40040c18c692ed73df24511071725b74956ca1a61fe6056a651d86a13bd
blobs/sha256/39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01
blobs/sha256/3cfb62949d9d8613854db4d5fe502a9219c2b55a153043500078a64e880ae234
blobs/sha256/41ba0806c6113064dd4cff12212eea3088f40ae23f182763ccc07f430b3a52f8
blobs/sha256/4b858171dd2c97fe3f2909cbc6935856b5a7a0a9b5bb82a41907a815ec497e53
blobs/sha256/51964c254c5a63abe88654776036ad9bfc2b6a2c20c959c96834a5247227240d
blobs/sha256/7184c046fdf17da4c16ca482e5ede36e1f2d41ac8cea9c036e488fd149d6e8e7
blobs/sha256/9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54    <---- this is the actual index I get from docker hub when querying
blobs/sha256/9a8fdc5b698322331ee7eba7dd6f66f3a4e956554db22dd1e834d519415b4f8e
blobs/sha256/ad295e950e71627e9d0d14cdc533f4031d42edae31ab57a841c5b9588eacc280
blobs/sha256/b28e271d721b3f6377cb5bae6cd4506d2736e77ef6f70ed9b0c4716da8bdf17c
blobs/sha256/b9e3228833e92f0688e0f87234e75965e62e47cfbb9ca8cc5fa19c2e7cd13f80
blobs/sha256/c20d2a9ab6869161e3ea6d8cb52d00be9adac2cc733d3fbc3955b9268bfd7fc5
blobs/sha256/c4fe9b047d1506377235b1dbcf01fb4a98cdf780554530decc99ef9893408ca6
blobs/sha256/cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08
blobs/sha256/e095eb9ac24e21bf2621f4d243274197ef12b91c67cde023092301b2db1e073c
blobs/sha256/ec30e5377f42dcfcf36047553dab57ed38b0b28babdeecbc34a165b7f3778814
blobs/sha256/f70734b6a266dcb5f44c383274821207885b549b75c8e119404917a61335981a

And the index.json would look something like:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.index.v1+json",
      "size": 1638,
      "digest": "sha256: 9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54",
      "annotations": {
        "org.opencontainers.image.ref.name": "docker.io/library/alpine:3.11"
      }
    }
  ]
}

And if I had both docker.io/library/alpine:3.11 and, say, quay.io/k8scsi/csi-node-driver-registrar:v1.0.1, then I would also have the index, manifest, layers for quay.io/k8scsi/csi-node-driver-registrar:v1.0.1 in blobs/sha256/, and my index.json would look like:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.index.v1+json",
      "size": 1638,
      "digest": "sha256: 9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54",
      "annotations": {
        "org.opencontainers.image.ref.name": "docker.io/library/alpine:3.11"
      }
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v1+json",
      "size": 5391,
      "digest": "sha256: a61c0432797e0cfabe2d4eae4a9d63ee8b4ff18696aa6177c5d0b3258ed824c7",
      "annotations": {
        "org.opencontainers.image.ref.name": "quay.io/k8scsi/csi-node-driver-registrar:v1.0.1"
      }
    }
  ]
}

Is that correct?

AkihiroSuda commented 3 years ago

"org.opencontainers.image.ref.name": "docker.io/library/alpine:3.11"

This part is expected to be "3.11" according to the Implementor's Note

Implementor's Note: A common use case of descriptors with a "org.opencontainers.image.ref.name" annotation is representing a "tag" for a container image. For example, an image may have a tag for different versions or builds of the software. In the wild you often see "tags" like "v1.0.0-vendor.0", "2.0.0-debug", etc. Those tags will often be represented in an image-layout repository with matching "org.opencontainers.image.ref.name" annotations like "v1.0.0-vendor.0", "2.0.0-debug", etc.

deitch commented 3 years ago

I saw that note @AkihiroSuda , which had me wondering most about that part.

This brings me back to the question, which I highlighted above. How would the image-layout format then handle having the blobs for both docker.io/library/alpine:3.11 and quay.io/k8scsi/csi-node-driver-registrar:v1.0.1 in the same root directory.

Granted, the example is a bit contrived, since those two share no blobs as far as I can tell, but it matters for the use cases of:

Also, I notice that the spec for the annotation for that one say:

Character set of the value SHOULD conform to alphanum of A-Za-z0-9 and separator set of -._:@/+

That sounds like it is intended to support a full image name, not just tag.

deitch commented 3 years ago

Also @AkihiroSuda did I get the rest of it right, in terms of what index.json is meant to be? Even if the image root is an index, the index.json itself is not the image index, but rather a single pointer to the blob in blobs/sha256/ which is the pulled down index?

deitch commented 3 years ago

Also @AkihiroSuda I renamed the issue, once you explained that it does support them, but it is a question of how. If I need to rename it again after your answers to the above, happy to do so.

Tehsmash commented 2 years ago

@deitch did you ever get anywhere with this? I have a scenario where I need to distribute as files/tarballs a number of images before they get loaded into a registry (remote location has no internet access to copy between registries directly). The the images share several layers between them as they are built on the same base image, but they end up duplicated in the exported tarballs. I was hoping that maybe converting to a single OCI image layout with multiple index.json entries could be my saving grace.

sudo-bmitch commented 2 years ago

I've been treating the Layout directory as a repository, where multiple images are referenced by different tag names in the index.json, and the blobs directory is deduplicated since the names would collide. It does require tooling performs their own GC on orphaned blobs if you change a tag and delete the reference to a old manifest. My own implementation is in regclient, where you can run:

regctl image copy image_a:tag_a ocidir://repo:tag_a
regctl image copy image_b:tag_b ocidir://repo:tag_b

Which would create a directory called repo and an index.json with tag_a and tag_b that you can then copy out by reversing the source/destination at the remote location. Code to support ocidir is in https://github.com/regclient/regclient/tree/main/scheme/ocidir

deitch commented 2 years ago

I did not @sudo-bmitch . In cases where I have total control over how the image-layout is produced and consumed, I do what was listed above:

"org.opencontainers.image.ref.name": "docker.io/library/alpine:3.11"

That is not what the spec says, which should be:

"org.opencontainers.image.ref.name": "3.11"

For example, we do this in linuxkit and some other software.

I really would like to have a standard where a single index.json can support multiple images as above, not just tags on images but full references.

Maybe @AkihiroSuda has some more input? He knows this better than I do (by a long shot).

sudo-bmitch commented 1 month ago

Circling back to this and rereading the thread, I've been treating the OCI Layout directory as a distinct "repository" from an upstream registry. So the "3.11" would be the tag inside the Layout directory, and it just happens to be a copy of the image from "docker.io/library/alpine" but could just as easily be a copy of the "registry.example.com/private-mirror/alpine" repo.

I do see the value for container engines in having a "here's the original repository name" annotation, since they treat all content as a local copy of an upstream repository, and they treat an OCI Layout as an alternate transport, rather than a repository itself. This would be breaking for tools that treat the Layout directory as a repository, since it allows multiple manifests with the same tag and different origin repositories, making it impossible to list the tags in the repository and select a manifest by tag.

Given the two possible treatments of this directory, I think we can see interoperability issues between tools working with the Layout that would be good for OCI to resolve.

deitch commented 1 month ago

That is what it comes down to @sudo-bmitch . The current OCI spec supports a single image and not multiple, although it is very close. So close, and so useful, that others (containerd, linuxkit, etc.) adopt it almost entirely and then do something slightly different for the index.

My position is not that the current system does support it (it does not), but that it is so close, and so useful, that it should, so let's do it.

tianon commented 1 month ago

That is not what the spec says, which should be:

"org.opencontainers.image.ref.name": "3.11"

That's not actually true -- the spec explicitly says both are valid (and always has): :sweat_smile: :see_no_evil:

https://github.com/opencontainers/image-spec/blob/8797c3fedb77200c102c816ddcd295bb1e19d3e3/annotations.md#pre-defined-annotation-keys:~:text=org.opencontainers.image.ref.name

https://github.com/opencontainers/image-spec/blob/8797c3fedb77200c102c816ddcd295bb1e19d3e3/annotations.md#L33-L43