rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.55k stars 267 forks source link

Windows Node cannot pull images from private registry #7103

Open luisxkimo opened 3 days ago

luisxkimo commented 3 days ago

Environmental Info: RKE2 Version: rke2 version v1.30.5+rke2r1 (0c83bc82315cd61664880d0b52a7e070e9fbd623) go version go1.22.6 X:boringcrypto

Node(s) CPU architecture, OS, and Version: Windows Server 2022 21H2 Build 20348.2700

Cluster Configuration: 2 Managers and 2 workers in RHEL all of them

Describe the bug: Windows Node cannot pull images from private registry.

Steps To Reproduce:

Expected behavior: Image is downloaded and pod is running

Actual behavior: Pod is not running with an error pulling the container image

Additional context / logs: Here is an example of "describe" log of the pod:

Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  3s    default-scheduler  Successfully assigned kube-system/csi-proxy-mzlvg to win-node.internal.ad.com
  Normal   Pulling    2s    kubelet            Pulling image "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2"
  Warning  Failed     2s    kubelet            Failed to pull image "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2": failed to pull and unpack image "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2": failed to resolve reference "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2": unexpected status from HEAD request to https://ghcr.internal-cache.com/v2/kubernetes-sigs/sig-windows/csi-proxy/manifests/v1.1.2: 403
  Warning  Failed     2s    kubelet            Error: ErrImagePull
  Normal   BackOff    2s    kubelet            Back-off pulling image "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2"

Here is the configuration of the file C:\etc\rancher\rke2\registries.yaml:

configs:
  docker.internal-cache.com:
    auth:
      password: thPassword
      username: username2
  ghcr.internal-cache.com:
    auth:
      password: ghcrpassWord
      username: usernameGHCR
brandond commented 2 days ago

Are you sure the credentials are correct? Are you sure that the image exists on that registry?

Can you pull that image successfully if you do ctr -n k8s.io image pull --user USER:PASSWORD ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2 ?

Assuming the tag exists and the creds are correct, you might also check containerd.log to see if it contains any more useful information on why the pull is failing.

luisxkimo commented 2 days ago

Hi @brandond ,

Yes, the command to pull manually the image works fine using the same credentials on the registries.yaml file.

I can't find any kind of containerd.log inside C:\var\log and subfolders, but in any case, seeing the error with "403" in the events of the pod, I guess that the issue is related with the credentials that is trying to use.

Maybe the issue is that I haven't the right registries.yaml file or are in wrong path. Actually is C:\etc\rancher\rke2\registries.yaml

brandond commented 1 day ago

That should be the right path. Are there any errors in the system log regarding the contents of that file? Do you see the registries and creds in c:/var/lib/rancher/rke2/agent/etc/containerd/config.toml?

luisxkimo commented 1 day ago

Yes, I can see a section on this config.toml like:

[plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "C:\\var\\lib\\rancher\\rke2\\agent\\etc\\containerd\\certs.d"

        [plugins."io.containerd.grpc.v1.cri".registry.configs.auth."docker.internal-cache.com"]
          username = "username2"
          password = "thPassword"

        [plugins."io.containerd.grpc.v1.cri".registry.configs.auth."ghcr.internal-cache.com"]
          username = "usernameGHCR"
          password = "ghcrpassWord"
brandond commented 1 day ago

That's correct then, and all that K3s is responsible for managing. Take a look at the containerd.log (also under the rke2 agent dir) and see what that says.

brandond commented 1 day ago

You can confirm that you're pulling images from docker.internal-cache.com? It doesn't have a port when referenced in the image name, or something else that would make a string comparison fail?