rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 397 forks source link

How to use a private registry by local DNS name? #707

Closed DuncanvR closed 2 years ago

DuncanvR commented 3 years ago

Version (k3OS / kernel) k3os version v0.11.1 5.4.0-48-generic #52 SMP Sat Sep 26 08:27:15 UTC 2020

Architecture x86_64

Describe the bug I originally reported this issue in the k3s repo (https://github.com/k3s-io/k3s/issues/1581), but it seems to fit better here. In short, I want to deploy images that come from a registry running on the same cluster. I can get it to work using the IP address of the registry, but not using an internal domain name (*.svc.cluster.local).

To Reproduce The linked issue lists the steps I took in detail; I'll summarise here:

  1. Deploy a registry to the cluster (no need for ingress, TLS or authentication).
  2. Use port forwarding to push an image to the registry.
  3. Configure k3s to use the private registry via its internal domain name.
  4. Create a pod using an image from the registry.

Expected behavior The pod comes online after pulling the image from the registry.

Actual behavior The pod goes into ErrImagePull state, as k3s cannot resolve the domain name of the registry, with a warning message like:

Failed to pull image "image-registry-2.duncanvr/test-image:latest": rpc error: code = Unknown desc = failed to pull and unpack image "image-registry-2.duncanvr/test-image:latest": failed to resolve reference "image-registry-2.duncanvr/test-image:latest": failed to do request: Head http://image-registry.containers.svc.cluster.local:5000/v2/test-image/manifests/latest: dial tcp: lookup image-registry.containers.svc.cluster.local: no such host

Additional context

In the linked issue I've also described my attempts to make k3OS use kube-dns for resolving domain names. This has been partially successful, i.e. working with many warnings. My main question is how to get that to work.

DuncanvR commented 3 years ago

I don't believe this should be marked as a bug per se. I have no idea how to remove that label though; perhaps I don't have the rights to do so.

dweomer commented 3 years ago

@DuncanvR I imagine that you would need an /etc/hosts entry on the k3OS host(s) for your registry (running in k3s in containerd) because containerd is what is talking to the registry which is not configured to leverage the coredns (running in k3s in containerd) for the cluster. You might be tempted to configure an /etc/resolve.conf entry to point to your coredns but that would be a mistake I would think as the next time the system reboots or coredns pod is restarted it will be thrown into a self-referential resolver loop.

dweomer commented 3 years ago

A more sophisticated solution might be scan your cluster's coredns for SRV records and export those to an network-local resolver combined with ingress to forward requests to the services in your cluster.

DuncanvR commented 3 years ago

Thanks @dweomer. I had already tried adding the cluster's DNS server to /etc/resolve.conf, but indeed saw warnings about resolver loops. Also, I had to specify the IP address of the coredns pod there, which isn't necessarily stable. I'm afraid the alternatives you mentioned would still require the same -- because as I understand it, in order to scan your cluster's coredns for SRV records, I'd still need to know how to reach the coredns pod, right?

In the end I'd love to find a solution that's a little more robust, i.e. not dependent on me entering IP addresses.

I guess the only place where those pod addresses (either coredns, or in my case the image registry) are known, is inside Kubernetes. A job could do something with them, but probably doesn't have rights to change anything on the node(s). The control plane might be a better place for that. But I'm starting to feel a little out of my depth here. Do you have any pointers?