k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.73k stars 2.32k forks source link

Mirrors configured in /etc/rancher/k3s/registries.yaml do not take effect #9626

Closed kingsd041 closed 6 months ago

kingsd041 commented 6 months ago

Environmental Info: K3s Version: v1.28.7+k3s1

Node(s) CPU architecture, OS, and Version: ubuntu 22.04

Cluster Configuration: 1 server

Describe the bug:

When configuring mirrors in /etc/rancher/k3s/registries.yaml, the mirrors do not take effect after starting K3s.

Steps To Reproduce:

cat >> /etc/rancher/k3s/registries.yaml <<EOF mirrors: "docker.io": endpoint:

Actual behavior:

The configured mirrors are not applied, and the mirrors field remains null in the output of crictl info.

Additional context / logs:

brandond commented 6 months ago

I believe this is a duplicate of https://github.com/k3s-io/k3s/issues/9341

Remove the second endpoint, you don't need to specify the default endpoint; it is always tried last - and on this specific release it triggers a bug in the registries configuration.

kingsd041 commented 6 months ago

@brandond I removed the second default endpoint, but the situation is still the same.

root@ip-172-31-15-10:/etc/rancher/k3s# cat registries.yaml
mirrors:
  "docker.io":
    endpoint:
      - "https://docker.nju.edu.cn/"
root@ip-172-31-15-10:/etc/rancher/k3s#
root@ip-172-31-15-10:/etc/rancher/k3s# systemctl restart k3s
root@ip-172-31-15-10:/etc/rancher/k3s# crictl info | grep -A 5 "registry"
    "registry": {
      "configPath": "/var/lib/rancher/k3s/agent/etc/containerd/certs.d",
      "mirrors": null,
      "configs": null,
      "auths": null,
      "headers": null
brandond commented 6 months ago

I'm not sure that crictl info will show mirrors when configPath is in use; you have to actually look at the filesystem. Have you checked to confirm that /var/lib/rancher/k3s/agent/etc/containerd/certs.d/docker.io/hosts.toml exists and has the correct content?

ref: https://github.com/containerd/containerd/blob/main/docs/hosts.md

kingsd041 commented 6 months ago

/var/lib/rancher/k3s/agent/etc/containerd/certs.d/docker.io/hosts.toml of v1.28.7+k3s1 shows the mirror I configured:

root@ip-172-31-4-143:~# cat /etc/rancher/k3s/registries.yaml
mirrors:
  "docker.io":
    endpoint:
      - "https://docker.nju.edu.cn/"

root@ip-172-31-4-143:~# cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/docker.io/hosts.toml
# File generated by k3s. DO NOT EDIT.
server = "https://registry-1.docker.io/v2"

[host."https://docker.nju.edu.cn/v2"]
  capabilities = ["pull", "resolve"]

However, I also tried the same operation using v1.28.5+k3s1. Using v1.28.5+k3s1, I can display the mirror information through crictl info:

root@ip-172-31-10-120:~# k3s -v
k3s version v1.28.5+k3s1 (5b2d1271)
go version go1.20.12

root@ip-172-31-10-120:~# crictl info | grep -A 5 "registry"
    "registry": {
      "configPath": "",
      "mirrors": {
        "docker.io": {
          "endpoint": [
            "https://docker.nju.edu.cn/"

root@ip-172-31-10-120:~# cat /etc/rancher/k3s/registries.yaml
mirrors:
  "docker.io":
    endpoint:
      - "https://docker.nju.edu.cn/"

Moreover, I also observed that the Containerd versions of v1.28.7+k3s1 and v1.28.5+k3s1 are both v1.7.11-k3s2, @brandond Do you know the reason why v1.28.7+k3s1 cannot display mirror through crictl info?

brandond commented 6 months ago

Because we switched to using hosts.d (configPath) instead of the deprecated inline mirrors config.

dberardo-com commented 3 months ago

i have been upgrading k3s from v1.27.4+k3s1 to v1.27.12+k3s1 because the doc clearly states that mirroring using wildcard is supported starting from that release.

now i face the same issue described above, also the folder: /var/lib/rancher/k3s/agent/etc/containerd/certs.d does not exist on my system, although the config.toml file points to it.

how can i finalize the upgrade in order for the mirroring functionality to work? and how do i verify that it is actually working ? any procedure ?

brandond commented 3 months ago

@dberardo-com if you have no content in that path, then you do not have any mirrors configured. Please show the contents of your registries.yaml, and kubectl get node -o yaml | grep args

dberardo-com commented 3 months ago

/etc/rancher/k3s/registries.yaml:

mirrors:
  "*":
  docker.io:
  registry.gitlab.com:
  registry.k8s.io:
  quay.io:

k3s.io/node-args: '["server","--node-ip","1.....","--tls-san","1....","--kubelet-arg","config=/var/lib/rancher/k3s/kubelet-config-override.yml","--kubelet-arg","node-status-update-frequency=4s","--kube-controller-manager-arg","node-monitor-period=4s","--kube-controller-manager-arg","node-monitor-grace-period=16s","--disable","servicelb"]'

brandond commented 3 months ago

@dberardo-com you've configured no external mirror endpoints, and haven't enabled the embedded registry mirror with the --embedded-registry flag so the local endpoint isn't enabled... what exactly did you expect to see with no mirror endpoints?

Have you read the docs at https://docs.k3s.io/installation/registry-mirror ?

dberardo-com commented 3 months ago

sure i have read the doc. there it states:

The "*" wildcard mirror entry can be used to enable distributed mirroring of all registries.

i am in an air gapped environment, and would like to deploy images from tar files into one machine and all other machines would be able to fetch them from that one machine.

also the machine i distribute images to can be just any node of the cluster. is that possible to achieve without any external endpoint ?

brandond commented 3 months ago

The "*" wildcard mirror entry can be used to enable distributed mirroring of all registries.

... assuming you've actually enabled the embedded distributed mirror? Which you haven't.

Please re-read the docs, paying attention to the section at the top where it says

In order to enable the embedded registry mirror, server nodes must be started with the --embedded-registry flag, or with embedded-registry: true in the configuration file. This option enables the embedded mirror for use on all nodes in the cluster.

Once that is enabled you can enable mirroring of specific registries, or all registries, as described in the section you quoted.

dberardo-com commented 3 months ago

that made it work! i stil cant see mirrors in the crictl info command but i now see the hosts.toml file being cretaed.

i also needed to add --disable-default-endpoint in order to make it work.

also noticed that the documentation has a typo as the flag should be --disable-default-registry -endpoint and not --disable-default-endpoint

perhaps the section where all this arguments are explained should be somewhat highlighted.

thanks for the support

brandond commented 3 months ago

i also needed to add --disable-default-endpoint in order to make it work

That shouldn't change anything, all that does is disable the fallback to the default registry endpoint. Other endpoints (including the embedded registry endpoint) are always tried first, before the default endpoint.

also noticed that the documentation has a typo as the flag should be --disable-default-registry -endpoint and not --disable-default-endpoint

Yes, I've been meaning to fix that! Thanks for the reminder.

dberardo-com commented 3 months ago

ok i get a new error. the registry seems to work fine for all cases, but not one in which my registry endpoint contains a port number. originally the image was pushed on "my.private-repo.com:440" (of course this is a fake name, just to give an example) and now i get this error from the pod:

Failed to pull image "my.private-repo.com:440/blabla/blabla/blabla:latest": rpc error: code = Unknown desc = failed to pull and unpack image "my.private-repo.com:440/blabla/blabla/blabla:latest": failed to resolve reference "my.private-repo.com:440/blabla/blabla/blabla:latest": unexpected status from HEAD request to https://127.0.0.1:6443/v2/blabla/blabla/blabla/latest?ns=my.private-repo.com%3A440: 500 Internal Server Error

is it important for the current host to resolve the the hostname my.private-repo.com ? or can it go without resolving it? or is the problem the port number ?

brandond commented 3 months ago

That indicates that no node has the image my.private-repo.com:440/blabla/blabla/blabla:latest. Can you confirm that another node has it and is sharing it?

dberardo-com commented 3 months ago

i can confirm that pods come up only on the node where this image was originally installed. i see the image running crictl images and the mirrors are enabled

brandond commented 3 months ago

Confirm that you've enabled that repo for mirroring on both nodes, and that the nodes are able to reach each other on both the p2p port, and the registry port.

You can start the nodes with --debug for more logs from spegel, when the image is being pulled.

dberardo-com commented 3 months ago

i mean, the mirroring is working for all other images ... only these ones from the :440 private registry are not being pulled succesfully.

brandond commented 3 months ago

Are you literally using the tag :latest? Have you seen the docs section on that tag? https://docs.k3s.io/installation/registry-mirror#latest-tag

dberardo-com commented 3 months ago

OMG ... didnt see that ... perhaps that is the reason ...

is it possible to pass the env K3S_P2P_ENABLE_LATEST=true as a flag to the k3s server command ?

or maybe it would suffice to run it as :

K3S_P2P_ENABLE_LATEST=true k3s server ....

will give it a go now

brandond commented 3 months ago

ideally you would put it in one of the .env files loaded by the systemd unit.

dberardo-com commented 3 months ago

i am trying this way, lets see if it works ... should i see anything different in the log or hosts.toml file ? https://serverfault.com/questions/413397/how-to-set-environment-variable-in-systemd-service

brandond commented 3 months ago

no, you'll just see that the latest tag is allowed to be fetched from the mirror

dberardo-com commented 3 months ago

the error persists ... i am seeing this in the log:

"k3s.io/node-env":"{\"K3S_DATA_DIR\":\"/var/lib/rancher/k3s/data/59e9576cfc98d814923f744cbb9a4f29add09b43994aec47e873f6dda3b1c50d\",\"K3S_P2P_ENABLE_LATEST\":\"true\"}",

so i guess the env var is loaded correctly, anything else i should check ? e.g. the k3s version or so ?

brandond commented 3 months ago

Did you set that on all the nodes?

dberardo-com commented 3 months ago

yes, i see that same log on every node

dberardo-com commented 3 months ago

i can confirm that the error persists even if i use a tag other than "latest", same error log

brandond commented 3 months ago

I'm not aware of any reason that spegel wouldn't be able to mirror images from a registry that includes a port in its address. Can you show the output of ctr image ls on the node that has the image, and the logs from the other node (running with --debug when you attempt to pull the image?

dberardo-com commented 3 months ago

ok i will add the debug flag to the k3s server command. which part of the log should i "grep" and post here ?

also, the command "ctr image ls" shows no image, whereas "crictl images" lists all images that are being used in the cluster, which include also the ":440" images.

those images are imported on the node using the "k3s ctr images import" command, from .tar files exported using "docker save" command

brandond commented 3 months ago

Just attach the complete log if you can - don't paste it inline in your comment.

those images are imported on the node using the "k3s ctr images import" command, from .tar files exported using "docker save" command

Are you specifying the correct namespace (-n k8s.io) when importing and listing the images using ctr? This is documented at https://docs.k3s.io/installation/registry-mirror#pushing-images

xingxing122 commented 2 months ago

I also have this problem, this is my relevant configuration, k3s version v1.29.5+k3s1

data-dir:  "data/containerd"
datastore-endpoint: etcd                                                                                                                        
kubelet-arg: "--root-dir=/data/kubelet"                                                                                                                                                                                                                     
token: "secret"                                                                                                                                 
bind-address: 127.0.0.1                                                                                                               
#write-kubeconfig: ~/.kube/config                                                                                                               
#write-kubeconfig-mode: 644                                                                                                                     
node-ip: 127.0.0.1                                                                                                                    
flannel-backend: host-gw                                                                                                                        
default-local-storage-path: /data/stor                                                                                                          
disable: "servicelb,traefik"                                                                                                                    
disable-cloud-controller: true
disable-default-endpoint: true
embedded-registry: true
mirrors:
  docker.io:
    endpoint:
      - "https://registry.my.io"
    rewrite:
      "^rancher/(.*)": "yushu/common/$1"
      "^kubesphere/(.*)": "yushu/common/$1"
      "^mirrorgooglecontainers/(.*)": "yushu/common/$1"
      "^library/(.*)": "yushu/common/$1"
      "^csiplugin/(.*)": "yushu/common/$1"

But every time it is installed, this image cannot be pulled, and the error is as follows Pulling image "rancher/mirrored-metrics-server:v0.7.0" Warning Failed 17s kubelet Failed to pull image "rancher/mirrored-metrics-server:v0.7.0": failed to pull and unpack image "docker.io/rancher/mirrored-metrics-server:v0.7.0": failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://127.0.0.1:6443/v2/rancher/mirrored-metrics-server/blobs/sha256:280126c0e181aba326fc843e7f17918dc9d54ddbfd917f5a3e0b346cec57fb70?ns=docker.io": dial tcp 127.0.0.1:6443: connect: connection refused Warning Failed 17s kubelet Error: ErrImagePull Normal Pulling 4s (x2 over 64s) kubelet Pulling image "rancher/mirrored-metrics-server:v0.7.0"

After configuring the private warehouse, are all YAML file installations on the local machine pulled from the private warehouse?

brandond commented 2 months ago

node-ip: 127.0.0.1

Don't do that!

Please confirm that you can reproduce this issue without the bind-address and node-ip set to the loopback address.

xingxing122 commented 2 months ago

No, the IP here is the real IP. For some reason, I blocked it when asking the question and rewrote it to 127.0.0.1

xingxing122 commented 2 months ago

But in my environment, the ip is real, and the image pull still fails. How can I troubleshoot this problem?

xingxing122 commented 2 months ago
image
xingxing122 commented 2 months ago

I pushed the image to my warehouse by default, but when installing k3s, why didn't it pull it from the warehouse I specified? It's a bit strange. It pulled the other 2 images, but only this image could not be pulled.

xingxing122 commented 2 months ago

k3s ctr --debug image pull rancher/mirrored-metrics-server:v0.7.0 DEBU[0000] fetching image="rancher/mirrored-metrics-server:v0.7.0" DEBU[0000] resolving host=rancher DEBU[0000] do request host=rancher request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, /" request.header.user-agent=containerd/v1.7.15-k3s1 request.method=HEAD url="https://rancher/v2/mirrored-metrics-server/manifests/v0.7.0" INFO[0000] trying next host error="failed to do request: Head \"https://rancher/v2/mirrored-metrics-server/manifests/v0.7.0\": dial tcp: lookup rancher: no such host" host=rancher ctr: failed to resolve reference "rancher/mirrored-metrics-server:v0.7.0": failed to do request: Head "https://rancher/v2/mirrored-metrics-server/manifests/v0.7.0": dial tcp: lookup rancher: no such host

brandond commented 2 months ago

Don't use ctr to pull images if you want to use mirrors. Containerd's mirroring is part of the CRI service, so you have to use crictl.

Check the containerd.log file for more information on why the pull is failing.

brandond commented 2 months ago

time="2024-07-17T11:00:37.798110999+08:00" level=info msg="trying next host - response was http.StatusNotFound" host=persagy2021-docker.pkg.coding.net

Are you sure the images are mirrored to this repo?

brandond commented 2 months ago

Add CONTAINERD_LOG_LEVEL=debug to /etc/sysconfig/k3s and restart the k3s service, you should get more logging out of containerd.

xingxing122 commented 2 months ago

time="2024-07-18T02:17:03.546979502-04:00" level=error msg="PullImage \"rancher/mirrored-metrics-server:v0.7.0\" failed" error="rpc error: code = NotFound desc = failed to pull and unpack image \"docker.io/rancher/mirrored-metrics-server:v0.7.0\": failed to copy: httpReadSeeker: failed open: could not fetch content descriptor sha256:fe5ca62666f04366c8e7f605aa82997d71320183e99962fa76b3209fdfbb8b58 (application/vnd.docker.image.rootfs.diff.tar.gzip) from remote: not found" time="2024-07-18T02:17:03.547009106-04:00" level=info msg="stop pulling image docker.io/rancher/mirrored-metrics-server:v0.7.0: active requests=0, bytes read=18254" time="2024-07-18T02:17:27.384055921-04:00" level=info msg="PullImage \"rancher/mirrored-metrics-server:v0.7.0\"" time="2024-07-18T02:17:27.743625555-04:00" level=info msg="trying next host - response was http.StatusNotFound" host=persagy2021-docker.pkg.coding.net

brandond commented 2 months ago

There should be additional logs before that showing the exact request it's making? Perhaps that requires setting the level to trace, I can't recall.

xingxing122 commented 2 months ago

How to set up and locate this issue

xingxing122 commented 2 months ago
time="2024-07-18T11:20:51.198161926-04:00" level=error msg="PullImage \"rancher/mirrored-metrics-server:v0.7.0\" failed" error="rpc error: code = NotFound desc = failed to pull and unpack image \"docker.io/rancher/mirrored-metrics-server:v0.7.0\": failed to copy: httpReadSeeker: failed open: could not fetch content descriptor sha256:fe5ca62666f04366c8e7f605aa82997d71320183e99962fa76b3209fdfbb8b58 (application/vnd.docker.image.rootfs.diff.tar.gzip) from remote: not found"
time="2024-07-18T11:20:51.198222335-04:00" level=info msg="stop pulling image docker.io/rancher/mirrored-metrics-server:v0.7.0: active requests=0, bytes read=18256"
time="2024-07-18T11:20:58.997354007-04:00" level=info msg="CreateContainer within sandbox \"1af55d69d42e4bbeb6a19a5f297bf07d347b12ca7690f3033cdccb89aaa9d340\" for container &ContainerMetadata{Name:nacos,Attempt:104,}"
time="2024-07-18T11:20:59.034638070-04:00" level=info msg="CreateContainer within sandbox \"1af55d69d42e4bbeb6a19a5f297bf07d347b12ca7690f3033cdccb89aaa9d340\" for &ContainerMetadata{Name:nacos,Attempt:104,} returns container id \"fe87f2dcacf94d799cb11b40a13f8138e0c1e9d3e7ce528ce7e55b683c2e179a\""
brandond commented 2 months ago

It sounds like you haven't enabled debug-level logging for containerd. When you do, you should see lines like this:

time="2024-07-22T20:57:23.455032392Z" level=debug msg="PullImage using normalized image ref: \"docker.io/rancher/mirrored-metrics-server:v0.7.0\""
time="2024-07-22T20:57:23.455081039Z" level=debug msg="PullImage \"docker.io/rancher/mirrored-metrics-server:v0.7.0\" with snapshotter overlayfs"
time="2024-07-22T20:57:23.457055654Z" level=debug msg="loading host directory" dir=/var/lib/rancher/k3s/agent/etc/containerd/certs.d/docker.io
time="2024-07-22T20:57:23.457391732Z" level=debug msg=resolving host=172-17-0-7.sslip.io
time="2024-07-22T20:57:23.457461564Z" level=debug msg="do request" host=172-17-0-7.sslip.io request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/v1.7.15-k3s1 request.method=HEAD url="https://172-17-0-7.sslip.io/v2/rancher/mirrored-metrics-server/manifests/v0.7.0?ns=docker.io"

All you're seeing is error and info level messages, so you have not properly configured the log level and restarted k3s prior to checking the logs.