k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.98k stars 2.34k forks source link

Second mirror registry with rewrite configuration not working properly #11191

Open flyfax opened 1 day ago

flyfax commented 1 day ago

Environmental Info: K3s Version: k3s version v1.30.5+k3s1 go version go1.22.6

Node(s) CPU architecture, OS, and Version: Linux 5.14.0-284.30.1.el9_2.x86_64 https://github.com/k3s-io/k3s/issues/1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 server, 1 agent

Describe the bug: I set up two mirror registries with rewrite configuration in registries.yaml in k3s(both server and agent)

mirrors:
  icr.io:
    endpoint:
      - "https://docker-na-public.artifactory.test.com"
    rewrite:
      "cpopen": "se-next-gen-docker-local/$1"
  cp.icr.io:
    endpoint:
      - "https://docker-na-public.artifactory.test.com"
    rewrite:
      "cp/se-data-center-edge": "se-next-gen-docker-local/$1"
configs:
  docker-na-public.artifactory.test.com:
    auth:
      username: <userid>
      password: <userpwd>

The first mirror registry configuration works well. I can start a pod that needs to pull image from icr.io/cpopen/edge-operator-catalog@sha256:4f9725b23c8560eae25be0a9fac01c74c9d4a9fee8200e31aad9842f7c338433, but actually pull image from mirror registry: https://docker-na-public.artifactory.test.com/se-next-gen-docker-local/edge-operator-catalog@sha256:4f9725b23c8560eae25be0a9fac01c74c9d4a9fee8200e31aad9842f7c338433 successfully

However, the second mirror registry configuration does not work properly. Another pod which needs to pull image from cp.icr.io/cp/se-data-center-edge/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687 can not pull image from mirror registry https://docker-na-public.artifactory.test.com/se-next-gen-docker-local/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687.

The error shows

Warning  Failed                  3s    kubelet                  Failed to pull image "cp.icr.io/cp/se-data-center-edge/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687": failed to pull and unpack image "cp.icr.io/cp/se-data-center-edge/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687": failed to resolve reference "cp.icr.io/cp/se-data-center-edge/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687": failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://docker-na-public.artifactory.test.com/artifactory/api/docker/null/v2/token?scope=repository%3Acp%2Fse-data--center-edge%2Fmini-test%3Apull&service=docker-na-public.artifactory.test.com: 401 Unauthorized

The thing is I could manually pull that image from mirror registry

ctr images pull --user <userid>:<userpwd> docker-na-public.artifactory.test.com/se-next-gen-docker-local/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687

WARN[0000] DEPRECATION: The `configs` property of `[plugins."io.containerd.grpc.v1.cri".registry]` is deprecated since containerd v1.5 and will be removed in containerd v2.1. Use `config_path` instead.
docker-na-public.artifactory.test.com/se-next-gen-docker-local/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687: resolved       |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687:                                                                                 done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:fde0050e6f120d9f47af9acd6401c0b606c8cc1a6993c8c54f940cb6d24558be:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:6d8a6fbf9f6a54c22b8f0d81aae09ee82f797fb5443dbbfb99659184cd9bea63:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:f50ab65647ec96ba313779f24c41e04bc6fde3e3ee79ee377ea8fd1901b896d5:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:82eae36a21fa93555db3ec8ca3b77e7e324264c7a5a877f19246f47805b71cc0:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:8f00d7682eb5816c6994c45b79851f9a708d3c20c5c75765b394bf96fcf1fe23:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:8829ee938b6487b295faf2ae62e7c650852273789afef2fcb8107653bb176b07:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:1021e82d93c8d7b0cb457c78327e2a9ec3109cc8afd672963f7cd71d79b52c31:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:65596b15a29de038d9ae9b60eed4056ac8a4a8563dd34526c97f235da4e1de84:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:2e1e56a9cbfc710dac4f3c047087d1a1863d569682a9a05c90cdd51c85ade7ab:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:a0f054cc0d49337016d542527abb33472dea611f22d4d0155f7a8af2a04a12ab:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:682c9aa5525e750605c9078cc5359d711a1b38442572d690bed120563cc88409:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:3a4a7a3b1aaac402c2a5de6603b8220b09db3213f0d11b2c1973e499813fe95e:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:5f69e3f397b1441dd4cd6ca12f51d10c855775415db522157cba24c6a8dacb1c:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:930e121e757df0828f0e7d582b1fb422eac393a83da2e472bb5e81177b0ed1c7:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:480e2acfac9c8e5a3d872c20b98cd2f16a8e61d974afb7a08a8ffa2afc921848:                                                                                   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:7f4b105bd855c23cb1ef1c9a4084cd219275a4d7c4716432ca64627de1f18cd5:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c3bb0e9cc4713d8f4d9fec6b912adde92254084c7e865e511ac62b16903a87c0:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:e2dccfaa2865e846135b8a2bc705630ef39b36499a2e14e7ad6b2957f02da593:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:88134de2e9e8e03ef2ffe812237ee7b4784283022f09267929400c9589265516:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c06fb0f70af454b3a4b4119caa92a54de20313c0aea0bd4b01eb6972aab6531a:                                                                                    done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 7.1 s                                                                                                                                                    total:  204.4  (28.8 MiB/s)
unpacking linux/amd64 sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687...
done: 9.420890146s

Is that rewrite configuration wrong for the second mirror registry?

Steps To Reproduce:

Expected behavior:

Actual behavior:

Additional context / logs:

brandond commented 1 day ago

You're using $1 in your rewrite but do not have a capture group in the regex so this will not be filled with anything. What exactly are you trying to do? Please check the docs for examples.

flyfax commented 1 day ago

I'm trying to let the pod pull image from the mirror registry docker-na-public.artifactory.test.com/se-next-gen-docker-local/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687 instead of original image definition cp.icr.io/cp/se-data-center-edge/mini-test@sha256:c718d3f996061aef92966a2171713af1cfdbac93cbea7a753107e3d5430c3687 using rewrite part to replace cp/se-data-center-edge to se-next-gen-docker-local

I look at the example here: https://docs.k3s.io/installation/private-registry. And also try to both configuration in rewrite part

 rewrite:
      "cp/se-data-center-edge": "se-next-gen-docker-local/$1"

and

 rewrite:
      "cp/se-data-center-edge/(.*)": "se-next-gen-docker-local/$1"

But I got the same error which seems rewrite part does not effect it.

brandond commented 1 day ago

Can you confirm that you're not using a custom containerd.toml.tmpl?

Also, verify the contents of /var/lib/rancher/k3s/agent/etc/containerd/certs.d/cp.icr.io/hosts.toml - do you see the rewrite in there?

You might also check the containerd logs to see if it contains any interesting errors regarding the pull.

brandond commented 1 day ago

failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://docker-na-public.artifactory.test.com/artifactory/api/docker/null/v2/token?scope=repository%3Acp%2Fse-data--center-edge%2Fmini-test%3Apull&service=docker-na-public.artifactory.test.com: 401 Unauthorized

https://docker-na-public.artifactory.test.com/artifactory/api/docker/null/v2/token

The null in this URL looks weird. Are you still getting that after fixing the regex?

The message also suggests that there is an extra hyphen coming from somewhere... the scope is repository:cp/se-data--center-edge/mini-test:pull which does not match what you said you're trying to pull. Did you perhaps typo the image in your pod spec as cp.icr.io/cp/se-data--center-edge/mini-test:latest, or add an extra hyphen in your replacement string?

flyfax commented 1 day ago

I did not use containerd.toml.tmpl, and rewrite part is in the host

[root@qb-reg5-m1 containerd]# ls
certs.d  config.toml
[root@qb-reg5-m1 containerd]# pwd
/var/lib/rancher/k3s/agent/etc/containerd

[root@qb-reg5-m1 containerd]# cat certs.d/cp.icr.io/hosts.toml
# File generated by k3s. DO NOT EDIT.

server = "https://cp.icr.io/v2"
capabilities = ["pull", "resolve", "push"]

[host]
[host."https://docker-na-public.artifactory.test.com/v2"]
  capabilities = ["pull", "resolve"]
  [host."https://docker-na-public.artifactory.test.com/v2".rewrite]
    "cp/se-data-center-edge/(.*)" = "se-next-gen-docker-local/$1"
flyfax commented 1 day ago

The message also suggests that there is an extra hyphen coming from somewhere... the scope is repository:cp/se-data--center-edge/mini-test:pull which does not match what you said you're trying to pull. Did you perhaps typo the image in your pod spec as cp.icr.io/cp/se-data--center-edge/mini-test:latest, or add an extra hyphen in your replacement string?

Yes, I still get the same error after fixing the regex.

The interesting thing is the first registry mirror working well. I could pull image from docker-na-public.artifactory.test.com/se-next-gen-docker-local/edge-operator-catalog@sha256:4f9725b23c8560eae25be0a9fac01c74c9d4a9fee8200e31aad9842f7c338433 instead of original path icr.io/cpopen/edge-operator-catalog@sha256:4f9725b23c8560eae25be0a9fac01c74c9d4a9fee8200e31aad9842f7c338433

Not sure if the issue is because of registry name 'cp.icr.io' includes 'cp' which part of regex?

brandond commented 1 day ago

It occurs to me - you've got registries.yaml on BOTH the nodes, right? That is node-specific configuration; it is not global cluster config. You need to configure that on the agent AND the server individually.

Assuming you've don that, You might try doing the following on whatever node the pod is being pulled from: echo CONTAINERD_LOG_LEVEL=debug >> /etc/sysconfig/k3s && systemctl restart k3s (on a server) echo CONTAINERD_LOG_LEVEL=debug >> /etc/sysconfig/k3s-agent && systemctl restart k3s-agent (on an agent)

That'll give you more info in the containerd.log

flyfax commented 1 day ago

Yes, I put registries.yaml in both server and agent nodes. Thanks for the suggestion, I will try to enable debug to see how it looks.

brandond commented 1 day ago

Just on the off chance the replacement is doing something weird, you might also try anchoring it?

  rewrite:
    "^cp/se-data-center-edge/(.+)$": "se-next-gen-docker-local/$1"
codering commented 1 hour ago

I have similar problem.

k3s version v1.29.4+k3s1 (94e29e2e)
go version go1.21.9

I cannot access Docker Hub, so I have placed the images on my own registry.

/etc/rancher/k3s/registries.yaml

mirrors:
  "docker.io":
    endpoint:
      - https://swr.cn-east-3.myhuaweicloud.com
    rewrite:
      "(.*)": "hmirror/$1"
configs:
  swr.cn-east-3.myhuaweicloud.com:
    auth:
      username: xx
      password: yy

Install k3s on a single node

curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_SKIP_SELINUX_RPM=true K3S_KUBECONFIG_MODE="644" INSTALL_K3S_MIRROR=cn K3S_TOKEN=SECRET INSTALL_K3S_VERSION="v1.29.4+k3s1" sh -

I can see images have pulled normally from my registry when k3s install.

[root@ecs-free-0001 tmp]# crictl images
IMAGE                                        TAG                    IMAGE ID            SIZE
docker.io/rancher/klipper-helm               v0.8.3-build20240228   0929b4140ada6       91.2MB
docker.io/rancher/klipper-lb                 v0.4.7                 edc812b8e25d0       4.78MB
docker.io/rancher/local-path-provisioner     v0.0.26                c54dcef6214cb       17.2MB
docker.io/rancher/mirrored-coredns-coredns   1.10.1                 ead0a4a53df89       16.2MB
docker.io/rancher/mirrored-library-traefik   2.10.7                 ee69e8120b64a       43.2MB
docker.io/rancher/mirrored-metrics-server    v0.7.0                 b9a5a1927366a       19.3MB
docker.io/rancher/mirrored-pause             3.6                    6270bb605e12e       298kB
[root@ecs-free-0001 tmp]# kubectl get po -A
NAMESPACE         NAME                                        READY   STATUS             RESTARTS   AGE
kube-system       local-path-provisioner-6c86858495-m7p9f     1/1     Running            0          15m
kube-system       svclb-traefik-839f5d4c-rkz2c                2/2     Running            0          12m
kube-system       helm-install-traefik-crd-tssm4              0/1     Completed          0          15m
kube-system       helm-install-traefik-frdwz                  0/1     Completed          1          15m
kube-system       coredns-6799fbcd5-9z2gm                     1/1     Running            0          15m
kube-system       traefik-7d5f6474df-kfzgh                    1/1     Running            0          12m
kube-system       metrics-server-54fd9b65b-fd5nn              1/1     Running            0          15m

when I pull another one image with original url from my registry , it's OK.

[root@k8s-master ~]# crictl pull swr.cn-east-3.myhuaweicloud.com/hmirror/rabbitmqoperator/cluster-operator:2.8.0
Image is up to date for sha256:c0a9306b27689ddde5429e1333bac7b5ca9dc49cf005918a49518fbebbfd9d8b
[root@k8s-master ~]# crictl images | grep cluster-operator
swr.cn-east-3.myhuaweicloud.com/hmirror/rabbitmqoperator/cluster-operator            2.8.0                                      c0a9306b27689       26MB
[root@k8s-master ~]#

but I can't pull it with rewrite. I don't know why.

[root@ecs-free-0001 tmp]# crictl pull rabbitmqoperator/cluster-operator:2.8.0
E1101 17:31:20.620215   16360 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/rabbitmqoperator/cluster-operator:2.8.0\": failed to resolve reference \"docker.io/rabbitmqoperator/cluster-operator:2.8.0\": failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://swr.cn-east-3.myhuaweicloud.com/swr/auth/v2/registry/auth/?scope=repository%3Ahmirror%2Frabbitmqoperator%2Fcluster-operator%3A&scope=repository%3Arabbitmqoperator%2Fcluster-operator%3Apull&service=dockyard: 404 Not Found" image="rabbitmqoperator/cluster-operator:2.8.0"
FATA[0000] pulling image: failed to pull and unpack image "docker.io/rabbitmqoperator/cluster-operator:2.8.0": failed to resolve reference "docker.io/rabbitmqoperator/cluster-operator:2.8.0": failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://swr.cn-east-3.myhuaweicloud.com/swr/auth/v2/registry/auth/?scope=repository%3Ahmirror%2Frabbitmqoperator%2Fcluster-operator%3A&scope=repository%3Arabbitmqoperator%2Fcluster-operator%3Apull&service=dockyard: 404 Not Found

Did I make a mistake in my configuration somewhere? But why is it able to normally pull the rancher images during the k3s installation?