Open elliotdobson opened 3 months ago
Another similar issue - kubernetes/kops#13377. And a slack thread on #kops-users.
@elliotdobson When SSH'ed into a problematic node can you run this command? Substituting the image for one that you expect to work. I'm curious if the response contains valid credentials or not.
echo '{"apiVersion":"credentialprovider.kubelet.k8s.io/v1","kind":"CredentialProviderRequest","image":"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9"}' | ecr-credential-provider get-credentials
@rifelpet it works fine. I can pass those credentials into crictl pull
and strangely it says the image is up to date...
$ echo '{"apiVersion":"credentialprovider.kubelet.k8s.io/v1","kind":"CredentialProviderRequest","image":"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9"}' | ecr-credential-provider get-credentials | jq '.auth | to_entries[].value | "\(.username):\(.password)"' | sudo xargs -I {} crictl pull --creds {} 123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9
Image is up to date for sha256:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c
and the node has all the required images...
$ sudo crictl images
IMAGE TAG IMAGE ID SIZE
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/calico/cni <none> 6527a35581401 88.4MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/calico/node <none> 5c6ffd2b2a1d0 116MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/dns/k8s-dns-node-cache <none> c65d25696473d 34.8MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/ebs-csi-driver/aws-ebs-csi-driver <none> d0b811ee8b120 29.2MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/eks-distro/kubernetes-csi/livenessprobe <none> 0f33636f3e138 8.06MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/eks-distro/kubernetes-csi/node-driver-registrar <none> 1e017ee0e9e78 6.78MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/etcd <none> 0369cf4303ffd 86.7MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/etcd <none> 1b2ba9f3d2043 57.1MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/etcdadm/etcd-manager-slim <none> 98a2527bc1dbc 49.3MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/kops/kops-controller <none> 5da88c7961ee1 50MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/kops/kops-utils-cp <none> 3a09361fb8252 2.29MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/kops/kube-apiserver-healthcheck <none> 9cd5ecc0d5313 5.5MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/kube-apiserver <none> a2e0d7fa8464a 35.2MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/kube-controller-manager <none> 32fe966e5c2b2 33.8MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/kube-proxy <none> cc8c46cf9d741 28.6MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/kube-scheduler <none> 9cffb486021b3 18.9MB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause 3.9 e6f1816883972 322kB
123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/provider-aws/cloud-controller-manager <none> 2a3fef77df4ae 21.2MB
The node has also successfully joined the cluster and kOps is validating OK... 🤔 There is still the same pull image error message in the kubelet log (for the first 10 minutes of the nodes life), but it seems to eventually get over that, pulls the images and starts the containers successfully. So I guess that is a red-herring.
I'll roll the rest of the cluster and report back if it's working.
I rolled the second control-plane node in the cluster and it failed to join the cluster within the default kOps validation timeout (15 mins).
When I SSH into the second control-plane node it has no container images present, kubelet logs are filled with pull image error messages (same as original post). However if I pull the pause
image manually then everything continues as expected & the node joins the cluster. Interestingly when I pull the pause
image manually it says the image is already up to date (however it was not present when I first logged into the node):
$ echo '{"apiVersion":"credentialprovider.kubelet.k8s.io/v1","kind":"CredentialProviderRequest","image":"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9"}' | ecr-credential-provider get-credentials | jq '.auth | to_entries[].value | "\(.username):\(.password)"' | sudo xargs -I {} crictl pull --creds {} 123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9
Image is up to date for sha256:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c
So definitely some issue around the AWS credential provider, containerProxy
, and the PodInfraContainerImage
(pause
image) interaction with node bootstrap.
Out of curiosity I tried rolling a worker node and it successfully joined the cluster but it cannot pull the pause
image. So all containers on the node are stuck in pending state and eventually kOps validation times out.
Out of curiosity I tried rolling a worker node and it successfully joined the cluster but it cannot pull the
pause
image. So all containers on the node are stuck in pending state and eventually kOps validation times out.
What is the error message when pulling the pause image?
What is the error message when pulling the pause image?
The same as reported in the original post:
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: I0820 05:41:26.005263 3267 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/etcd-manager-events-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: I0820 05:41:26.025018 3267 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/etcd-manager-main-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: I0820 05:41:26.053492 3267 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/kube-apiserver-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.077935 3267 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.078072 3267 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials" pod="kube-system/etcd-manager-events-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.078174 3267 kuberuntime_manager.go:1182] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials" pod="kube-system/etcd-manager-events-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.078784 3267 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-manager-events-i-0b9bd5da0497cb428_kube-system(404a8513399b8f50110f57caceb4dff4)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-manager-events-i-0b9bd5da0497cb428_kube-system(404a8513399b8f50110f57caceb4dff4)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to pull image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to pull and unpack image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to resolve reference \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials\"" pod="kube-system/etcd-manager-events-i-0b9bd5da0497cb428" podUID="404a8513399b8f50110f57caceb4dff4"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.084358 3267 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.084580 3267 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials" pod="kube-system/etcd-manager-main-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.084647 3267 kuberuntime_manager.go:1182] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials" pod="kube-system/etcd-manager-main-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.084885 3267 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-manager-main-i-0b9bd5da0497cb428_kube-system(e6690dd45219e272fbb9992ff07f37b9)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-manager-main-i-0b9bd5da0497cb428_kube-system(e6690dd45219e272fbb9992ff07f37b9)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to pull image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to pull and unpack image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to resolve reference \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials\"" pod="kube-system/etcd-manager-main-i-0b9bd5da0497cb428" podUID="e6690dd45219e272fbb9992ff07f37b9"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.086957 3267 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.087058 3267 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials" pod="kube-system/kube-apiserver-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.087396 3267 kuberuntime_manager.go:1182] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to pull and unpack image \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": failed to resolve reference \"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials" pod="kube-system/kube-apiserver-i-0b9bd5da0497cb428"
Aug 20 05:41:26 i-0b9bd5da0497cb428 kubelet[3267]: E0820 05:41:26.087529 3267 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-apiserver-i-0b9bd5da0497cb428_kube-system(bce2dccacc4b8ea6a460dcd7760b01ed)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-apiserver-i-0b9bd5da0497cb428_kube-system(bce2dccacc4b8ea6a460dcd7760b01ed)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to pull image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to pull and unpack image \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": failed to resolve reference \\\"123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/k8s-example/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\\\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials\"" pod="kube-system/kube-apiserver-i-0b9bd5da0497cb428" podUID="bce2dccacc4b8ea6a460dcd7760b01ed"
And similar access denied in the containerd logs too (presumably kubelet just supersets containerd logs)
What about echo ... | ecr-credential-provider
on that node? any additional logs that could be relevant?
What about
echo ... | ecr-credential-provider
on that node?
Works fine, I get valid credentials back that I can then use to pull images from private ECR.
any additional logs that could be relevant?
Not that I know of. Do you have any that you suggest?
In the slack thread I found @olemarkus had a few comments that seem to point to the issue:
Specifically when containerd tries to pull the pause
image it does not receive ECR credentials from kubelet since the reason for the pull is not a Pod? Could that be the underlying issue?
The issue is indeed the same. When kubelet initiates a container pull, it will use the ECR plugin and pass this on to containerd. Containerd has no knowledge of neither the ECR plugin or even kubelet. When it tries to pull an image it will use what is configured in the containerd config as is.
The easiest workaround is to call the credential helper directly and pass the credentials to crictl and pull the image, as suggested here: https://github.com/containerd/containerd/issues/6637#issuecomment-1061965828
Bottlerocket does the same: https://github.com/bottlerocket-os/bottlerocket/pull/382
You might be able to do this with additionalUserData but need to ensure it runs after containerd is running, so it may take adding a systemd service that depends on containerd.
A long term solution would be for kops' nodeup to pull the podInfraContainerImage
explicitly whenever containerProxy
is set.
Ok so the root issue is that containerd pulls the sandbox image anonymously, and so the ultimate fix for this would need to come in containerd to enable that. (I have commented on the containerd issue that you linked)
In the meantime though...
I like your idea about using additionalUserData
however I think a more scalable option is to pull the sandbox image manually via kOps hooks
which can depend on systemd services etc. Though this option requires us to remember to update the sandbox image version in the hook when it changes in kOps.
a long-term workaround (like you say) would be for kOps nodeup to pull the sandbox image (podInfraContainerImage
) explicitly whenever containerProxy
or containerRegistry
is set.
Perhaps another alternative to kOps Container Image Asset Repository would be containerd Registry Mirror as per kubernetes/kops#16593. What do you think?
/kind bug
1. What
kops
version are you running? The commandkops version
, will display this information. Client version: 1.29.2 (git-v1.29.2)2. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag. Server Version: v1.29.73. What cloud provider are you using? AWS
4. What commands did you run? What is the simplest way to reproduce this issue? We are configuring local image asset repository however we are running into an issue when trying to update the cluster. We have configured all the ECR private repositories as required.
assets.containerProxy
in the Cluster speckops get assets --copy
kops update cluster
kops rolling-update
5. What happened after the commands executed? New node fails to join the cluster and cluster validation fails.
Upon SSH'ing into the new node and checking the logs via
journalctl -u kubelet.service
we see that kubelet is unable to pull images from ECR:6. What did you expect to happen? kubelet is able to pull images successfully from ECR.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.9. Anything else do we need to know? The kubelet log shows that the image credential provider flags are being passed:
The ecr-credential-provider binary exists at the location passed to kubelet:
The credential provider config exists at the location passed to kubelet (and looks valid):
Seems like a similar issue as kubernetes/kops#13494 however there was no clear resolution in that issue (and we are not using the AWS China partition).