k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.16k stars 2.35k forks source link

k3s uses HTTPS to connect to insecure HTTP container registry and fails #11340

Closed javiertury closed 1 week ago

javiertury commented 1 week ago

Environmental Info: K3s Version:

k3s version v1.30.6+k3s1 (1829eaae)
go version go1.22.8

Node(s) CPU architecture, OS, and Version:

Linux server1 5.14.0-427.42.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 1 14:58:02 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

1 server, 1 agent

Describe the bug:

I configured an insecure private http registry by creating the file /etc/rancher/k3s/registries.yaml with

mirrors:
  "registry.domain":
    endpoint:
      - "http://registry.domain"

configs:
  "registry.domain":
    auth:
      username: user
      password: password

The insecure registry runs on port 80.

Until this summer, I had another cluster with a similar configuration (older k3s version and registry running on port 5000) and insecure HTTP registries worked fine. However this new k3s cluster wants to use only HTTPS and fails, as can be seen in the (redacted) logs

Nov 18 21:20:24 server k3s[163970]: E1118 21:20:24.257888  163970 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"web\" with ErrImagePull: \"failed to pull and unpack image \\\"registry.domain/user/web@sha256:XXX\\\": failed to resolve reference \\\"registry.domain/user/web@sha256:XXX\\\": failed to do request: Head \\\"https://registry.domain/v2/user/web/manifests/sha256:XXX\\\": http: server gave HTTP response to HTTPS client\"" pod="namespace/web-XXX" podUID="XXX"

Steps To Reproduce:

Expected behavior:

k3s should be able to access insecure/plain HTTP registries

Actual behavior:

k3s always uses HTTPS on the plain HTTP registry

Additional context / logs:

The generated file /var/lib/rancher/k3s/agent/etc/containerd/certs.d/registry.domain/hosts.toml contains a https reference (why?)

# File generated by k3s. DO NOT EDIT.

server = "https://registry.domain/v2"
capabilities = ["pull", "resolve", "push"]

[host]
[host."http://registry.domain/v2"]
  capabilities = ["pull", "resolve"]

and the generated file /var/lib/rancher/k3s/agent/etc/containerd/config.toml doesn't include mirrors

# File generated by k3s. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/k3s/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = false
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true

[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "/var/lib/rancher/k3s/data/cni"
  conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "/var/lib/rancher/k3s/agent/etc/containerd/certs.d"

[plugins."io.containerd.grpc.v1.cri".registry.configs."registry.domain".auth]
  username = "user"
  password = "password"
brandond commented 1 week ago

The generated file /var/lib/rancher/k3s/agent/etc/certs.d/registry.domain/hosts.toml contains a https reference (why?)

That's the default endpoint. From the docs:

https://docs.k3s.io/installation/private-registry#default-endpoint-fallback Containerd has an implicit "default endpoint" for all registries. The default endpoint is always tried as a last resort, even if there are other endpoints listed for that registry in registries.yaml.

  • The default endpoint for docker.io is https://index.docker.io/v2.
  • The default endpoint for all other registries is https://<REGISTRY>/v2, where <REGISTRY> is the registry hostname and optional port.

If it's falling back to that, it means the pull from your http endpoint failed first. Check the containerd logs and figure out why.

javiertury commented 1 week ago

I tried adding --disable-default-registry-endpoint to /etc/systemd/system/k3s.service, doing a systemctl daemon-reload and restarting k3s, but the /var/lib/rancher/k3s/agent/etc/containerd/certs.d/registry.domain/hosts.toml still has the https url in there.

If it's falling back to that, it means the pull from your http endpoint failed first. Check the containerd logs and figure out why.

I checked containerd logs and there are requests always come in pairs. Is the first line on each pair the insecure HTTP one? Do you know what else should I check? As a sanity check, podman CLI push and pull from the registry fine with --tls-verify=false.

time="2024-11-18T23:52:52.744088994+01:00" level=info msg="trying next host - response was http.StatusNotFound" host=registry.domain
time="2024-11-18T23:52:52.746469113+01:00" level=info msg="trying next host" error="failed to do request: Head \"https://registry.domain/v2/user/web/blobs/sha256:XXX\": http: server gave HTTP response to HTTPS client" host=registry.domain
javiertury commented 1 week ago

Finally I found out the origin of this. k3s was connecting to the private repository over insecure HTTP fine, but since the image version was missing (deleted due to age), it tried the HTTPS endpoint which does not exist. kubectl only printed messages about the HTTPS error, but it omitting the fact that it connected to the first endpoint and the image was missing there. And that omission was driving me crazy.

Thanks a lot @brandond !