spegel-org / spegel

Stateless cluster local OCI registry mirror.
MIT License
1.23k stars 66 forks source link

Default to docker.io #423

Open mrclrchtr opened 7 months ago

mrclrchtr commented 7 months ago

Describe the problem to be solved

Hi, I would like to use spegel as a pull through cache in my pipeline. There images are often specified as follows:

image: flyway/flyway:10.11.0

Spegel then says the following error message:

"path":"/v2/flyway/flyway/manifests/10.11.0","status":404,"method":"GET","latency":0.000058161,"ip":"10.0.19.75","error":"registry parameter needs to be set for tag references"

Proposed solution to the problem

My understanding is that there is always implicitly "docker.io" as registry in front of this notation or? Would it be possible to set this default instead of throwing the error message? Or at least offer the option to set a default?

phillebaba commented 7 months ago

I will have to dig deeper into Containerd image pulling logic and Kubernetes sends image references through the CRI API.

While Docker does implicitly assume that the registry is docker.io that is not the case for Containerd. When pulling with ctr for example it requires you to give a full image reference including the registry.

The interesting thing that is happening here is that Containerd is picking up the registry in the mirror configuration, but is not adding the original registry in the mirrored HTTP request. While defaulting would solve this specific problem, I think that we need to understand the core issue of why this is occurring.

mrclrchtr commented 7 months ago

Thank you very much for the very quick response!

Ah ok, you're right.

Which might also be interesting (maybe that's what you mean by "Containerd is picking up the registry in the mirror configuration"): The download still works, even though I specified Spegel as the mirror only. Unfortunately, I'm not sure whether Docker simply sends a request to docker.io anyway if Spegel responds with a 404? In any case, the images are available at the end.

phillebaba commented 7 months ago

I just ran the e2e tests without the registry specified and it worked without any issues so we need to dig a bit deeper into this.

To make sure that we are on the same page. Am I right to assume that you are running Spegel in Kubernetes and specifying the image references in a Pod spec? Or are you using Spegel in some other fashion. The reason I ask is that there is a difference between how Containerd will pull an image and how Docker would, which is critical for the functionality of Spegel.

Could you just quickly describe your environment so that I can better understand where the error originates from.

mrclrchtr commented 7 months ago

Yes sure:

Values

spegel:
  containerdRegistryConfigPath: /etc/cri/conf.d/hosts
serviceMonitor:
  enabled: true

All Pods are automatically pulling their images successfully. So it's basically working as expected.

I want to use it in gha-runner-scale-set:

      - image: docker:26.0.0-dind
        name: dind
        securityContext:
          privileged: true
        env:
          - name: DOCKER_GROUP_GID
            value: "123"
        args:
          - dockerd
          - --host=unix:///var/run/docker.sock
          - --group=$(DOCKER_GROUP_GID)
          - --registry-mirror=http://spegel-registry.spegel.svc.cluster.8log.de.local:5000
        volumeMounts:
          - mountPath: /home/runner/_work
            name: work
          - mountPath: /var/run
            name: dind-sock
          - mountPath: /home/runner/externals
            name: dind-externals

--registry-mirror=http://spegel-registry.spegel.svc.cluster.8log.de.local:5000 is the relevant part to connect the dind to spegel. Thats also working as expected, but with the error messages described.

Do you need further information?

phillebaba commented 7 months ago

I had a feeling the problem would come from DIND, i was expecting some issue like this as you stated you were using this in a CI.

First of all the issue isn't huge but still something you want to fix. The error you are seeing comes from when Docker tries to resolve the image tag to a digest. Which Spegel will fail to do with you setup for all images. Thankfully Docker and in turn Containerd will always fallback to the original registry if the mirror fails. This is a very nice feature to have. The reason Spegel is not able to resolve the tag is because it requires the original registry to be included in the request. This additional data is not required when fetching other layers as they are referenced by their digest. The registry is expected to be passed as a query parameter in the HEAD request.

/v2/flyway/flyway/manifests/10.11.0?ns=docker.io

Now sadly this is not yet part of the OCI distribution spec yet as it is pending reviews. Containerd however has already implemented the feature a long time ago. So Spegel relies on the fact that Containerd is pulling the image. Containerd forked parts of Dockers code to pull images a while back so this change never mades its way into the Docker codebase.

Finally to the two solutions you have. You can either disable tag resolving in Spegel, which isn't great but would remove the errors. Or you configure Docker to use Containerd as it's image store. When enabled Docker will depend on Containerd to run as your image store. This means that Containerd will be pulling images for you.

https://docs.docker.com/storage/containerd/#enable-containerd-image-store-on-docker-engine

Having said that I am aware that this means that images pulled by Docker would end up in its own Containerd namespace which will not be shared by Kubernetes so Spegel will not pick this up. We could solve this pretty easily by allowing users to configure multiple Containerd namespaces.

I have been thinking about writing some best practice docs for using Spegel with build systems as I know that there are some interesting aspects around build layer caching between nodes. It is not something that I have started to work on sadly.

Hopefully this answers your questions, I will have a look in allowing multiple namespaces if Docker uses a different namespace.

mrclrchtr commented 7 months ago

Thank you very much for this detailed explanation! I have learned a lot just by reading it. I'm not that deep into this topic, so I didn't realize that it behaves so differently.

I would be very happy about best practices for using Spegel with build systems. This is exactly the area I would like to optimize. Currently, all images are downloaded again in every CI run. This is such an unnecessary waste of resources.

You can either disable tag resolving in Spegel, which isn't great but would remove the errors.

As this only causes the error messages to disappear, this is not an option.

Docker to use Containerd as it's image store. When enabled Docker will depend on Containerd to run as your image store. This means that Containerd will be pulling images for you.

That would mean that the images can really be delivered by spegel? That would be completely sufficient for me for now.

My approach would otherwise be to set up a local docker registry as a cache through proxy and connect dind to it.

Anyway. Thanks again for the quick and detailed help and the great tool. When I discovered Spegel, I was immediately impressed. It was exactly what I was looking for.

phillebaba commented 7 months ago

No problem. I tested this setup myself and observed that Docker will create a separate namespace called moby in Containerd. This means that Spegel will not see the images pulled by Docker as it is configured by default to use the k8s.io namespace.

Reading Containerd docs however states that by default blobs will be shared between namespaces. So it might not be as bad as I expected.

https://github.com/containerd/containerd/blob/main/docs/ops.md#bolt-metadata-plugin

I will verify that this also includes image metadata later today. If that is true we can basically assume that namespaces are a non issue and the content will be shared between the two namespaces.

danielloader commented 5 months ago

Chipping in to say I'm also interested in this workflow - already using/abusing an environment variable flag to enable containerd backend in my DIND github actions runner sets, so half the way there.

mrclrchtr commented 5 months ago

@phillebaba can i help you with this topic in any way?

PrivatePuffin commented 2 weeks ago

@mrclrchtr Did you get it working, at least pulling somethings?

I get the feeling GitHub actions docker pulls (like ghcr.io/renovatebot/renovate) still dont get into my Spegel using this:

                  - dockerd
                  - --host=unix:///var/run/docker.sock
                  - --group=123
                  - --registry-mirror=http://spegel-registry.spegel.svc.cluster.local:5000
                  - --insecure-registry=http://spegel-registry.spegel.svc.cluster.local:5000
                  - --feature=containerd-snapshotter
mrclrchtr commented 2 weeks ago

Unfortunately not. I still use docker registries as a pull through cache.

PrivatePuffin commented 2 weeks ago

Unfortunately not. I still use docker registries as a pull through cache.

I fixed it by moving to the kubernetes backend. One of the things I noticed, primarily, is the fact that docker doesn't use the mirror for any non-docker hub registries. So the registry mirror settings is borderline unusable for DinD.

However, Kubernetes backend worked like an absolute charm.

mrclrchtr commented 1 week ago

@PrivatePuffin what do you mean by kubernetes backend? I googled it briefly, but couldn't find anything useful.

PrivatePuffin commented 1 week ago

@PrivatePuffin what do you mean by kubernetes backend? I googled it briefly, but couldn't find anything useful.

GitHub Actions Controller has both a kubernets or DinD deployment option.