kubernetes / minikube

Run Kubernetes locally
https://minikube.sigs.k8s.io/
Apache License 2.0
29.2k stars 4.87k forks source link

Fix handling of image digests in the minikube cache #10075

Open afbjorklund opened 3 years ago

afbjorklund commented 3 years ago

Currently we carry a patch to save the image tags for digests in the tarball, but it's not working OK.

docker.io/library/busybox:latest@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71

https://github.com/google/go-containerregistry/blob/v0.3.0/pkg/name/digest.go#L29

--- a/pkg/name/digest.go
+++ b/pkg/name/digest.go
@@ -28,6 +28,7 @@ const (
 // Digest stores a digest name in a structured form.
 type Digest struct {
        Repository
+       tag      string
        digest   string
        original string
 }

The code still checks the name only (since the preload), and the docker code seems to be broken.

For instance, podman doesn't support it - so the current workaround is to just silently drop it.

Error: invalid image reference "docker.io/library/busybox:latest@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71": Docker references with both a tag and digest are currently not supported

We don't really want it anyway, when trying to get away from docker default and towards using CRI.

So it would be good to work with go-containerregistry upstream, and try to drop the current patch...

https://github.com/google/go-containerregistry/pull/703

https://github.com/afbjorklund/go-containerregistry/commit/fbad78ec28135dd344a3122696ab703ddbfe9e01


Minikube is currently using version 0.1.0

https://github.com/google/go-containerregistry/releases/tag/v0.1.0

Upstream has since released version 0.3.0

https://github.com/google/go-containerregistry/releases/tag/v0.3.0

afbjorklund commented 3 years ago

Problems seen in #10060

afbjorklund commented 3 years ago

"Id" is not available in all container runtimes, and might not even be static across a single runtime.

"Digest" is only available in the online registries, and not preserved when saving and loading etc.

So we might need to revisit the cache code, and add some more metadata (besides just the tarball)

The end goal is to be able to support things like ":latest", to compare two images with same repo:tag.

afbjorklund commented 3 years ago

The new version also has support for progress bars, so maybe we can finally fix #7012

It is super-annoying now, we don't even show the name but only show that tractor emoji.

The request for showing the transfer of the individual cache images is here: #7175

Note that there are actually two things involved, the transfer and the load / uncompress

afbjorklund commented 3 years ago

So when we pull an image, we get two things: the Id and the Digest.

Both are normally sha256 digests, but of two of different things.

The Id file is stored in the actual tarball, under Config. It might or might not be stored in the container engine.

Engine Id
Docker stored: "Id"
Podman stored: "Id"
Containerd not stored, separately

But the Digest is related to the URL it was pulled from. It might or might not be stored in the container engine.

Engine Digest
Docker stored: "RepoDigests"
Podman stored, twice: "Digest", "RepoDigests"
Containerd stored, DIGEST

The same image (id) can have multiple digests, depending on the URL.

$ docker pull ubuntu:20.04
20.04: Pulling from library/ubuntu
Digest: sha256:c95a8e48bf88e9849f3e0f723d9f49fa12c5a00cfc6e60d2bc99d87555295e4c
Status: Image is up to date for ubuntu:20.04
docker.io/library/ubuntu:20.04
$ docker pull localhost:5000/ubuntu:20.04
20.04: Pulling from ubuntu
Digest: sha256:4e4bc990609ed865e07afc8427c30ffdddca5153fd4e82c20d8f0783a291e241
Status: Image is up to date for localhost:5000/ubuntu:20.04
localhost:5000/ubuntu:20.04
$ docker images ubuntu:20.04
REPOSITORY   TAG       IMAGE ID       CREATED       SIZE
ubuntu       20.04     f643c72bc252   5 weeks ago   72.9MB
$ docker images localhost:5000/ubuntu:20.04
REPOSITORY              TAG       IMAGE ID       CREATED       SIZE
localhost:5000/ubuntu   20.04     f643c72bc252   5 weeks ago   72.9MB

The digest can also be totally missing, for locally created images.

Recreating the Id involves reading the Config and doing sha256sum.

$ docker save ubuntu:20.04 | tar xfO - manifest.json
[{"Config":"f643c72bc25212974c16f3348b3a898b1ec1eb13ec1539e10a103e6e217eb2f1.json","RepoTags":["ubuntu:20.04"],"Layers":["96b555d8387efc8adb493c5a3d5c453f44874ccaddf93ac35027f26608d83108/layer.tar","d7bd
d7d291799b8991768e87181710917acf79a1639b22968785aee614bb0ed3/layer.tar","db5e5b730d6b10f78842614d63c9ff75096fc5a55a20bd704a5f7205a0a82218/layer.tar"]}]
$ docker save ubuntu:20.04 | tar xfO - f643c72bc25212974c16f3348b3a898b1ec1eb13ec1539e10a103e6e217eb2f1.json | sha256sum 
f643c72bc25212974c16f3348b3a898b1ec1eb13ec1539e10a103e6e217eb2f1  -

This operation can take a while to seek, and filenames used vary a bit too.

Recreating the Digest involves pulling the image metadata from the registry.

$ docker save busybox > busybox.tar
$ docker rmi busybox:latest 
Untagged: busybox:latest
Untagged: busybox@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71
$ docker load < busybox.tar 
Loaded image: busybox:latest
$ docker inspect busybox | head
[
    {
        "Id": "sha256:a77dce18d0ecb0c1f368e336528ab8054567a9055269c07c0169cba15aec0291",
        "RepoTags": [
            "busybox:latest"
        ],
        "RepoDigests": [],
        "Parent": "",
        "Comment": "",
        "Created": "2020-12-30T01:19:53.606880456Z",
$  docker pull busybox:latest
latest: Pulling from library/busybox
Digest: sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71
Status: Image is up to date for busybox:latest
docker.io/library/busybox:latest
$ docker inspect busybox | head
[
    {
        "Id": "sha256:a77dce18d0ecb0c1f368e336528ab8054567a9055269c07c0169cba15aec0291",
        "RepoTags": [
            "busybox:latest"
        ],
        "RepoDigests": [
            "busybox@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71"
        ],
        "Parent": "",

So we need to make both metadata accessible, quickly in the case of Id.

This is so that we can compare the cache contents with the machine storage.

Currently we only store the repo and tag, implicitly at that (in the filename...)

As noted elsewhere (#9593), we also need to store the platform (i.e. os and arch)

afbjorklund commented 3 years ago

Note that it also need to "resolve" short names into the long names.

busybox -> busybox:latest -> docker.io/busybox:latest -> docker.io/library/busybox:latest

Some engines use alias lookups, and some only use the long names.

See https://www.redhat.com/sysadmin/container-image-short-names

jonjohnsonjr commented 3 years ago

The same image (id) can have multiple digests, depending on the URL.

This isn't based on the URL. The reason they have different digests is that (presumably) you have run docker pull && docker push to populate that image in localhost:5000, but docker only pulls the platform-specific image, so you pulled a manifest list and pushed an individual manifest:

$ crane manifest ubuntu:20.04 | jq .
{
  "manifests": [
    {
      "digest": "sha256:4e4bc990609ed865e07afc8427c30ffdddca5153fd4e82c20d8f0783a291e241",
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "size": 943
    },
    {
      "digest": "sha256:be2aa2178e05b3d1930b4192ba405cb1d260f6a573abab4a6e83e0ebec626cf1",
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "arm",
        "os": "linux",
        "variant": "v7"
      },
      "size": 943
    },
    {
      "digest": "sha256:42c332a4493b201f8a5e3d4019e464aa2f5c6e6ef8fedccd0b6d3a7ac0912670",
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "arm64",
        "os": "linux",
        "variant": "v8"
      },
      "size": 943
    },
    {
      "digest": "sha256:21f4dd9e02054a3ef9048a2f1384f64ba6368dc85fecbb1fb0d5592b75173e4d",
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "ppc64le",
        "os": "linux"
      },
      "size": 943
    },
    {
      "digest": "sha256:54585b0cee318ba7997bf4d1342f27754889ebf7be8c2f3a3f59752e856a7904",
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "s390x",
        "os": "linux"
      },
      "size": 943
    }
  ],
  "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
  "schemaVersion": 2
}

Note that the top-level manifest list for ubuntu:20.04 here is the sha256:c95a8e48bf88e9849f3e0f723d9f49fa12c5a00cfc6e60d2bc99d87555295e4c image:

$ crane digest ubuntu:20.04
sha256:c95a8e48bf88e9849f3e0f723d9f49fa12c5a00cfc6e60d2bc99d87555295e4c

but you've pushed the linux/amd64 variant sha256:4e4bc990609ed865e07afc8427c30ffdddca5153fd4e82c20d8f0783a291e241 to localhost.

afbjorklund commented 3 years ago

This isn't based on the URL. The reason they have different digests is that (presumably) you have run docker pull && docker push to populate that image in localhost:5000

That is true, I did that. 🤭

This is also making cross-compiling images a lot harder, especially the ones that depend on parent images... (the inability of the docker daemon to understand platform, and using the same namespace for amd64/arm64)

I even had to deploy a local registry, just to make it work. Eventually we forked (copied) the upstream instead. So now there are no intermediate images, but I guess we have to make some workaround for ubuntu image.

Even worse, but "easier"

afbjorklund commented 3 years ago

But I'm still not sure I see the benefit of using digest, especially if limited to the old docker tools that mangle it ?

i.e.

gcr.io/k8s-minikube/kicbase:v0.0.15-snapshot4@sha256:ef1f485b5a1cfa4c989bc05e153f0a8525968ec999e242efff871cbb31649c16

If it would have been content-based like git (or camlistore), then there would for sure be a lot of value added...

Maybe I just need to use crane more, with the weird timestamps and the predictable compression.

afbjorklund commented 3 years ago

Seems like this checksum is a fight that needs to be fought every couple of years or so...

There was a struggle when it came to validating software downloads, with Zero Install. (it validates the content before compression, instead of the compressed archives after) We had the same thing in rpm versus yum, but as usual the good guys lost that one too. I suppose it's nice to be able to validate a download, but then a CRC would have worked.

Maybe we need both of them, to protect against archiver and compression attacks (tarbombs)

afbjorklund commented 3 years ago

But I dunno. I was ready to go back to ~FTP~, err, I mean HTTP because of all the little issues. Like the missing progress bar*, or the clumsy display of the checksum, or whatever it was ? (now lately there was the pull limits of docker.io, but I guess it doesn't apply to gcr.io - yet) Rumor has it that go-containerregistry also does progress now, so maybe we can still fix it...

* See #7012

jonjohnsonjr commented 3 years ago

If it would have been content-based like git (or camlistore), then there would for sure be a lot of value added...

It is content based, just biased towards the content you deal with during distribution instead of at runtime.

Maybe we need both of them, to protect against archiver and compression attacks (tarbombs)

Right, we do have both available to us. The non-compressed bits have content-addressable identifiers that are referenced by the image manifest, so in theory you can get everything you need, it's just a little more cumbersome than a flat key space.

I'm not entirely sure how the minikube cache works, but everything is content-addressable, so doing what you want is definitely possible, it's just a matter of which identifiers are available and mapping between them. If you want one blessed identifier, you should use the Image ID because that's available everywhere and remains constant regardless of compression and annotations, but minikube could be clever about mapping between digests and image IDs and take advantage of a registry as a cache.

especially if limited to the old docker tools that mangle it

I don't quite understand what you're saying here.

I was ready to go back to FTP, err, I mean HTTP because of all the little issues.

I think I'm missing something. The distribution API does use HTTP.

Rumor has it that go-containerregistry also does progress now, so maybe we can still fix it...

I believe so, yes, when writing to a tarball.

afbjorklund commented 3 years ago
  especially if limited to the old docker tools that mangle it

I don't quite understand what you're saying here.

It was regarding the docker push versus crane push conversation.

Seems that the digests can be made to work, just stop using docker ?

 I was ready to go back to FTP, err, I mean HTTP because of all the little issues.

I think I'm missing something. The distribution API does use HTTP.

It was just a rant of crawling back into the cave: FTP the tarball only, and load it locally instead ?

Partly triggered by the corporate firewall blocking the access to the corporate registry, that day.

  Rumor has it that go-containerregistry also does progress now, so maybe we can still fix it...

I believe so, yes, when writing to a tarball.

The workaround was using go-getter to download the tarball, and then go-containerregistry to load.

We considered using it for other purposes, such as offline usage or when gcr.io is being blocked.

afbjorklund commented 3 years ago

Something really really weird is going on, with github comments. Should move "discussions" elsewhere.

Tried to restore the mangled content, sorry about fatfingering the reply button (and making an edit instead)

jonjohnsonjr commented 3 years ago

Seems that the digests can be made to work, just stop using docker ?

Roughly, but of course that's easier said than done :)

Partly triggered by the corporate firewall blocking the access to the corporate registry, that day.

🤦

The workaround was using go-getter to download the tarball, and then go-containerregistry to load.

Another workaround would be to supply a custom transport to go-containerregistry that can tee updates back to something else to display progress while the layer gets downloaded.

We considered using it for other purposes, such as offline usage or when gcr.io is being blocked.

This is reasonable.

I'll point you towards some work that @deitch did recently for caching things in linuxkit using go-containerregistry that might be similar to your needs. This is pretty close to what I would design from scratch, FWIW: https://github.com/linuxkit/linuxkit/pull/3573

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten