siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.9k stars 555 forks source link

Error pulling pause image from private repository #9594

Closed mottetm closed 2 weeks ago

mottetm commented 3 weeks ago

Bug Report

Description

When specifying a custom pause image as described here located in a private registry, kubelet fails to pull the image with a 401 Unauthorized error, despite the private registry being configured in the machine config (registries.config.<private.registry>.auth).

This was previously working in v1.6.8.

Logs

10.0.2.136: {"ts":1730192466923.7874,"caller":"internal/log.go:32","msg":"RunPodSandbox from runtime service failed","err":"rpc error: code = Unknown desc = failed to start sandbox \"d7dd8a5fdcbfa0d3a2370269216a5824c2125223c8d86f582856138c7c73d85a\": failed to get sandbox image \"<private.registry>/pause:3.8\": failed to pull image \"<private.registry>/pause:3.8\": failed to pull and unpack image \"<private.registry>/pause:3.8\": failed to resolve reference \"<private.registry>/pause:3.8\": unexpected status from HEAD request to https://registry.hydra.anywhere.navify.com/v2/navify-anywhere-edge/navify-anywhere-edge-staging/navify-anywhere/pause/manifests/3.8: 401 Unauthorized"}
10.0.2.136: {"ts":1730192466924.1497,"caller":"kuberuntime/kuberuntime_sandbox.go:72","msg":"Failed to create sandbox for pod","pod":{"name":"kube-scheduler-talos-uj9-jpe","namespace":"kube-system"},"err":"rpc error: code = Unknown desc = failed to start sandbox \"d7dd8a5fdcbfa0d3a2370269216a5824c2125223c8d86f582856138c7c73d85a\": failed to get sandbox image \"<private.registry>/pause:3.8\": failed to pull image \"<private.registry>/pause:3.8\": failed to pull and unpack image \"<private.registry>/pause:3.8\": failed to resolve reference \"<private.registry>/pause:3.8\": unexpected status from HEAD request to https://registry.hydra.anywhere.navify.com/v2/navify-anywhere-edge/navify-anywhere-edge-staging/navify-anywhere/pause/manifests/3.8: 401 Unauthorized"}
10.0.2.136: {"ts":1730192466924.1914,"caller":"kuberuntime/kuberuntime_manager.go:1170","msg":"CreatePodSandbox for pod failed","pod":{"name":"kube-scheduler-talos-uj9-jpe","namespace":"kube-system"},"err":"rpc error: code = Unknown desc = failed to start sandbox \"d7dd8a5fdcbfa0d3a2370269216a5824c2125223c8d86f582856138c7c73d85a\": failed to get sandbox image \"<private.registry>/pause:3.8\": failed to pull image \"<private.registry>/pause:3.8\": failed to pull and unpack image \"<private.registry>/pause:3.8\": failed to resolve reference \"<private.registry>/pause:3.8\": unexpected status from HEAD request to https://registry.hydra.anywhere.navify.com/v2/navify-anywhere-edge/navify-anywhere-edge-staging/navify-anywhere/pause/manifests/3.8: 401 Unauthorized"}
10.0.2.136: {"ts":1730192466924.4077,"caller":"kubelet/pod_workers.go:1301","msg":"Error syncing pod, skipping","pod":{"name":"kube-scheduler-talos-uj9-jpe","namespace":"kube-system"},"podUID":"28f3e425f44f09de092fdccf37b4450f","err":"failed to \"CreatePodSandbox\" for \"kube-scheduler-talos-uj9-jpe_kube-system(28f3e425f44f09de092fdccf37b4450f)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-talos-uj9-jpe_kube-system(28f3e425f44f09de092fdccf37b4450f)\\\": rpc error: code = Unknown desc = failed to start sandbox \\\"d7dd8a5fdcbfa0d3a2370269216a5824c2125223c8d86f582856138c7c73d85a\\\": failed to get sandbox image \\\"<private.registry>/pause:3.8\\\": failed to pull image \\\"<private.registry>/pause:3.8\\\": failed to pull and unpack image \\\"<private.registry>/pause:3.8\\\": failed to resolve reference \\\"<private.registry>/pause:3.8\\\": unexpected status from HEAD request to https://registry.hydra.anywhere.navify.com/v2/navify-anywhere-edge/navify-anywhere-edge-staging/navify-anywhere/pause/manifests/3.8: 401 Unauthorized\"","errCauses":[{"error":"failed to \"CreatePodSandbox\" for \"kube-scheduler-talos-uj9-jpe_kube-system(28f3e425f44f09de092fdccf37b4450f)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-talos-uj9-jpe_kube-system(28f3e425f44f09de092fdccf37b4450f)\\\": rpc error: code = Unknown desc = failed to start sandbox \\\"d7dd8a5fdcbfa0d3a2370269216a5824c2125223c8d86f582856138c7c73d85a\\\": failed to get sandbox image \\\"<private.registry>/pause:3.8\\\": failed to pull image \\\"<private.registry>/pause:3.8\\\": failed to pull and unpack image \\\"<private.registry>/pause:3.8\\\": failed to resolve reference \\\"<private.registry>/pause:3.8\\\": unexpected status from HEAD request to https://registry.hydra.anywhere.navify.com/v2/navify-anywhere-edge/navify-anywhere-edge-staging/navify-anywhere/pause/manifests/3.8: 401 Unauthorized\""}]}

Environment

smira commented 3 weeks ago

I can confirm it's a bug, but not sure whether it's containerd or Talos generating config for containerd.

smira commented 3 weeks ago

It's a containerd bug: https://github.com/containerd/containerd/issues/10916, depending on the response from the upstream, we will either wait for the next release/patch, or patch it ourselves.

MarkLFT commented 2 weeks ago

Can someone please confirm a version that does not have this issue? I have tried three different releases, and I cannot pull images from a private Nexus Docker repository. It am pretty sure I have the secrets set correctly, as on Ubuntu they pull correctly.

If I can find a known good release that does not have this issue, I can prove if the issue is something I am doing or the containerd problem.

Thanks.

frezbo commented 2 weeks ago

1.7 talos versions will work since it uses old containerd

MarkLFT commented 2 weeks ago

@frezbo. Thanks for the info. Then I must be doing something wrong, as I tried version 1.7.7, and I still received an authentication error when trying to pull from a private registry.

smira commented 2 weeks ago

This issue is only about Talos 1.8+ and specifically about pause image (pod sandbox).