Quick start fails with download docker image

StefanSa commented 2 years ago

hi there, I am trying to install an rke2 cluster using the "Quick start" instructions. With rke2 server -debug I see the following error:

INFO[0000] Pulling runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1
W0208 16:55:08.830481   19011 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0208 16:55:08.833665   19011 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
FATA[0001] chmod /var/lib/rancher/rke2/data/v1.22.6-rke2r1-e6c1502b55cd/bin: no such file or directory

It looks to me that it is not downloading the docker image and therefore the install fails. What am i doing wrong here ?

brandond commented 2 years ago

Can you post the complete RKE2 logs, not just the bit at the end? Unless you've omitted some messages, it doesn't look like the pull is failing, you would see something like this:

INFO[0000] Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r9
WARN[0000] Failed to load runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r9 from tarball: no local image available for index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r9: not found in any file in /var/lib/rancher/rke2/agent/images: image not found
INFO[0000] Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r9
WARN[0000] Failed to load runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r9 from tarball: no local image available for index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r9: not found in any file in /var/lib/rancher/rke2/agent/images: image not found
INFO[0000] Pulling runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r9
WARN[0001] Failed to get image from endpoint: GET https://index.docker.io/v2/rancher/rke2-runtime/manifests/v1.22.6-rke2r9: MANIFEST_UNKNOWN: manifest unknown; unknown tag=v1.22.6-rke2r9
FATA[0001] failed to get runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r9: all endpoints failed: GET https://index.docker.io/v2/rancher/rke2-runtime/manifests/v1.22.6-rke2r9: MANIFEST_UNKNOWN: manifest unknown; unknown tag=v1.22.6-rke2r9

I suspect there's something else going on with your host or environment. Are you running RKE2 as root? Is there something else that would prevent RKE2 from creating files under /var/lib/rancher?

StefanSa commented 2 years ago

@brandond surely not a problem.

rke2 server --debug
WARN[0000] not running in CIS mode
INFO[0000] Starting rke2 v1.22.6+rke2r1 (2dcedaf55c49ef9b24849c92702bdc12ea589d7a)
INFO[0000] Managed etcd cluster initializing
W0208 17:52:30.146758   19433 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
INFO[0000] Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,rke2 --authorization-mode=Node,RBAC --bind-address=0.0.0.0 --cert-dir=/var/lib/rancher/rke2/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --enable-admission-plugins=NodeRestriction,PodSecurityPolicy --encryption-provider-config=/var/lib/rancher/rke2/server/cred/encryption-config.json --etcd-cafile=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --etcd-certfile=/var/lib/rancher/rke2/server/tls/etcd/client.crt --etcd-keyfile=/var/lib/rancher/rke2/server/tls/etcd/client.key --etcd-servers=https://127.0.0.1:2379 --feature-gates=JobTrackingWithFinalizers=true --insecure-port=0 --kubelet-certificate-authority=/var/lib/rancher/rke2/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.key --profiling=false --proxy-client-cert-file=/var/lib/rancher/rke2/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/rke2/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/rke2/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/rancher/rke2/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/rke2/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/rke2/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/rke2/server/tls/serving-kube-apiserver.key
W0208 17:52:30.149657   19433 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
INFO[0000] Running kube-scheduler --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --profiling=false --secure-port=10259
INFO[0000] Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --cluster-signing-kube-apiserver-client-key-file=/var/lib/rancher/rke2/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --cluster-signing-kubelet-client-key-file=/var/lib/rancher/rke2/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/var/lib/rancher/rke2/server/tls/server-ca.crt --cluster-signing-kubelet-serving-key-file=/var/lib/rancher/rke2/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --cluster-signing-legacy-unknown-key-file=/var/lib/rancher/rke2/server/tls/client-ca.key --configure-cloud-routes=false --controllers=*,-service,-route,-cloud-node-lifecycle --feature-gates=JobTrackingWithFinalizers=true --kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --profiling=false --root-ca-file=/var/lib/rancher/rke2/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/var/lib/rancher/rke2/server/tls/service.key --use-service-account-credentials=true
INFO[0000] Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --bind-address=127.0.0.1 --cloud-provider=rke2 --cluster-cidr=10.42.0.0/16 --configure-cloud-routes=false --kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --node-status-update-frequency=1m0s --port=0 --profiling=false
INFO[0000] Node token is available at /var/lib/rancher/rke2/server/token
INFO[0000] To join node to cluster: rke2 agent -s https://172.16.34.94:9345 -t ${NODE_TOKEN}
INFO[0000] Wrote kubeconfig /etc/rancher/rke2/rke2.yaml
INFO[0000] Run: rke2 kubectl
INFO[0000] Cluster-Http-Server 2022/02/08 17:52:30 http: TLS handshake error from 127.0.0.1:34634: remote error: tls: bad certificate
INFO[0000] Cluster-Http-Server 2022/02/08 17:52:30 http: TLS handshake error from 127.0.0.1:34640: remote error: tls: bad certificate
DEBU[0000] password verified locally for node 'r2-node01'
INFO[0000] certificate CN=r2-node01 signed by CN=rke2-server-ca@1644334360: notBefore=2022-02-08 15:32:40 +0000 UTC notAfter=2023-02-08 16:52:30 +0000 UTC
DEBU[0000] password verified locally for node 'r2-node01'
INFO[0000] certificate CN=system:node:r2-node01,O=system:nodes signed by CN=rke2-client-ca@1644334360: notBefore=2022-02-08 15:32:40 +0000 UTC notAfter=2023-02-08 16:52:30 +0000 UTC
INFO[0000] Module overlay was already loaded
INFO[0000] Module nf_conntrack was already loaded
INFO[0000] Module br_netfilter was already loaded
INFO[0000] Module iptable_nat was already loaded
DEBU[0000] getConntrackMax: using conntrack-min
INFO[0000] Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1
WARN[0000] Failed to load runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1 from tarball: no local image available for index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1: not found in any file in /var/lib/rancher/rke2/agent/images: image not found
INFO[0000] Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1
WARN[0000] Failed to load runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1 from tarball: no local image available for index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1: not found in any file in /var/lib/rancher/rke2/agent/images: image not found
DEBU[0000] Kubelet image credential provider bin directory check failed: stat /var/lib/rancher/credentialprovider/bin: no such file or directory
INFO[0000] Pulling runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1
W0208 17:52:31.148264   19433 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0208 17:52:31.152052   19433 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
FATA[0001] chmod /var/lib/rancher/rke2/data/v1.22.6-rke2r1-e6c1502b55cd/bin: no such file or directory

yes

Are you running RKE2 as root

When starting from the service or rke2 /var/lib/rancher is created, OS = openLeap v15.3.

brandond commented 2 years ago

I can't reproduce this. What Linux distribution are you using? Is there a proxy or firewall between your host and docker.io that might be intercepting image pulls from Docker Hub?

StefanSa commented 2 years ago

OS = openLeap v15.3

brandond commented 2 years ago

I can't reproduce this. Is there anything unique about your network or host configuration?

opensuse01:~ # cat /etc/os-release
NAME="openSUSE Leap"
VERSION="15.3"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.3"
PRETTY_NAME="openSUSE Leap 15.3"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.3"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"

opensuse01:~ # curl -sfL https://get.rke2.io | sh -
[INFO]  finding release for channel stable
[INFO]  using v1.22.6+rke2r1 as release
[INFO]  downloading checksums at https://github.com/rancher/rke2/releases/download/v1.22.6+rke2r1/sha256sum-amd64.txt
[INFO]  downloading tarball at https://github.com/rancher/rke2/releases/download/v1.22.6+rke2r1/rke2.linux-amd64.tar.gz
[INFO]  verifying tarball
[INFO]  unpacking tarball file to /usr/local

opensuse01:~ # systemctl start rke2-server

opensuse01:~ # export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin

opensuse01:~ # kubectl get nodes -o wide
NAME         STATUS   ROLES                       AGE     VERSION          INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION                CONTAINER-RUNTIME
opensuse01   Ready    control-plane,etcd,master   4m33s   v1.22.6+rke2r1   10.0.1.170    <none>        openSUSE Leap 15.3   5.3.18-150300.59.46-default   containerd://1.5.9-k3s1

StefanSa commented 2 years ago

@brandond

What is the complete command behind it ?

Pulling runtime image index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1

brandond commented 2 years ago

I'm not sure what you mean by "complete command". I showed you all the commands I ran to successfully complete an install on a fresh OpenSUSE Leap host. The runtime image is pulled internally by RKE2 and the binaries/manifests it contains written to disk.

StefanSa commented 2 years ago

Ok asked the other way round. How is this image downloaded, with wget, curl etc. ? I want to test this to see if there are really problems with the network.

brandond commented 2 years ago

It is pulled down from Docker Hub directly by the RKE2 process, using code compiled into the binary. Since it is a Docker image you can't just curl it.

StefanSa commented 2 years ago

@brandond This is very strange. A docker pull index.docker.io/rancher/rke2-runtime:v1.22.6-rke2r1 (same server), is successful but rke2 has problems to download it. Despite debug option, there is no connection or download error. Any idea?

StefanSa commented 2 years ago

@brandond OK, the rke2 rpm version shows up more talkative.

INFO[0010] Failed to test data store connection: context deadline exceeded
ERRO[0014] Failed to pull docker.io/rancher/hardened-kubernetes:v1.18.20-rke2r1: rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/rancher/hardened-kubernetes:v1.18.20-rke2r1": failed to copy: httpReaderSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/25/25d27a70b83aa01e38d17536812a80cdc1369bc43c3ac732630d4166d06ed6cc/data?verify=1644397662-dHno1RfAwydWS89YEYcox8W%2Blao%3D": read tcp 172.16.34.94:52116->104.18.122.25:443: read: connection reset by peer
INFO[0014] Pulling image docker.io/rancher/hardened-kubernetes:v1.18.20-rke2r1...

StefanSa commented 2 years ago

@brandond Thanks for your help, problem found. It had been the WAF in the Edge Firewall.

yashasjindal commented 7 months ago

Hi, I have been having the same issue, where the quickstart commands keep failing me (tried on a fresh install of raspian on raspberry pi 5 on ARM, as well as on a fresh install of Ubuntu 22.0.4 LTS on an x86 machine):

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ curl -sfL https://get.rke2.io | sudo sh -
[INFO]  finding release for channel stable
[INFO]  using v1.27.12+rke2r1 as release
[INFO]  downloading checksums at https://github.com/rancher/rke2/releases/download/v1.27.12+rke2r1/sha256sum-arm64.txt
[INFO]  downloading tarball at https://github.com/rancher/rke2/releases/download/v1.27.12+rke2r1/rke2.linux-arm64.tar.gz
[INFO]  verifying tarball
[INFO]  unpacking tarball file to /usr/local

$ sudo systemctl enable rke2-server.service
Created symlink /etc/systemd/system/multi-user.target.wants/rke2-server.service → /usr/local/lib/systemd/system/rke2-server.service.

$sudo systemctl start rke2-server.service

Then I am stuck on that window, I start $ journalctl -u rke2-server -f in a new terminal windows within vscode, and get these logs (i have filtered them to mostly include the parts which have the errors)

Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=info msg="Module iptable_nat was already loaded"
Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=info msg="Module iptable_filter was already loaded"
Apr 15 23:34:55 master rke2[10588]: W0415 23:34:55.007107   10588 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=info msg="Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.27.12-rke2r1"
Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=warning msg="Failed to load runtime image index.docker.io/rancher/rke2-runtime:v1.27.12-rke2r1 from tarball: no local image available for index.docker.io/rancher/rke2-runtime:v1.27.12-rke2r1: not found in any file in /var/lib/rancher/rke2/agent/images: image not found"
Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=info msg="Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.27.12-rke2r1"
Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=warning msg="Failed to load runtime image index.docker.io/rancher/rke2-runtime:v1.27.12-rke2r1 from tarball: no local image available for index.docker.io/rancher/rke2-runtime:v1.27.12-rke2r1: not found in any file in /var/lib/rancher/rke2/agent/images: image not found"
Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=info msg="Pulling runtime image index.docker.io/rancher/rke2-runtime:v1.27.12-rke2r1"
Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=info msg="Creating directory /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin"
Apr 15 23:34:55 master rke2[10588]: time="2024-04-15T23:34:55-04:00" level=info msg="Extracting file bin/containerd to /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin/containerd"
Apr 15 23:34:59 master rke2[10588]: time="2024-04-15T23:34:59-04:00" level=info msg="Extracting file bin/containerd-shim to /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin/containerd-shim"
Apr 15 23:34:59 master rke2[10588]: time="2024-04-15T23:34:59-04:00" level=info msg="Extracting file bin/containerd-shim-runc-v1 to /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin/containerd-shim-runc-v1"
Apr 15 23:35:00 master rke2[10588]: time="2024-04-15T23:35:00-04:00" level=info msg="Extracting file bin/containerd-shim-runc-v2 to /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin/containerd-shim-runc-v2"
Apr 15 23:35:01 master rke2[10588]: time="2024-04-15T23:35:01-04:00" level=info msg="Extracting file bin/crictl to /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin/crictl"
Apr 15 23:35:03 master rke2[10588]: time="2024-04-15T23:35:03-04:00" level=info msg="Extracting file bin/ctr to /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin/ctr"
Apr 15 23:35:05 master rke2[10588]: time="2024-04-15T23:35:05-04:00" level=info msg="Extracting file bin/kubectl to /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin/kubectl"
Apr 15 23:35:08 master rke2[10588]: {"level":"warn","ts":"2024-04-15T23:35:08.045885-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x40007a6540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Apr 15 23:35:08 master rke2[10588]: {"level":"info","ts":"2024-04-15T23:35:08.045988-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"}
...
Apr 15 23:35:16 master rke2[10588]: time="2024-04-15T23:35:16-04:00" level=error msg="Error encountered while importing /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt: failed to pull images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt: image \"index.docker.io/rancher/rke2-cloud-provider:v1.28.2-build20231016\": not found"
Apr 15 23:35:16 master rke2[10588]: time="2024-04-15T23:35:16-04:00" level=info msg="Pulling images from /var/lib/rancher/rke2/agent/images/etcd-image.txt"
Apr 15 23:35:16 master rke2[10588]: time="2024-04-15T23:35:16-04:00" level=info msg="Image index.docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20230802 has already been pulled"
Apr 15 23:35:16 master rke2[10588]: time="2024-04-15T23:35:16-04:00" level=error msg="Error encountered while importing /var/lib/rancher/rke2/agent/images/etcd-image.txt: failed to pull images from /var/lib/rancher/rke2/agent/images/etcd-image.txt: image \"index.docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20230802\": not found"
Apr 15 23:35:16 master rke2[10588]: time="2024-04-15T23:35:16-04:00" level=info msg="Pulling images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt"
Apr 15 23:35:16 master rke2[10588]: time="2024-04-15T23:35:16-04:00" level=info msg="Image index.docker.io/rancher/hardened-kubernetes:v1.27.12-rke2r1-build20240315 has already been pulled"

Until eventually I get stuck in this loop:

Apr 15 23:37:53 master rke2[10588]: time="2024-04-15T23:37:53-04:00" level=info msg="Waiting for etcd server to become available"
Apr 15 23:37:53 master rke2[10588]: time="2024-04-15T23:37:53-04:00" level=info msg="Waiting for API server to become available"
Apr 15 23:37:53 master rke2[10588]: time="2024-04-15T23:37:53-04:00" level=info msg="Pod for etcd not synced (pod sandbox not found), retrying"
Apr 15 23:37:56 master rke2[10588]: time="2024-04-15T23:37:56-04:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"

Edit: Forgot to add, after this I usually give up after a few hours of the same loop and uninstall, which gives this output (including if it provides any relevant info)

$ sudo /usr/local/bin/rke2-uninstall.sh
+ id -u
+ [ ! 0 -eq 0 ]
+ . /etc/os-release
+ PRETTY_NAME=Debian GNU/Linux 12 (bookworm)
+ NAME=Debian GNU/Linux
+ VERSION_ID=12
+ VERSION=12 (bookworm)
+ VERSION_CODENAME=bookworm
+ ID=debian
+ HOME_URL=https://www.debian.org/
+ SUPPORT_URL=https://www.debian.org/support
+ BUG_REPORT_URL=https://bugs.debian.org/
+ [ -r /etc/redhat-release ]
+ [ -r /etc/centos-release ]
+ [ -r /etc/oracle-release ]
+ [  = suse ]
+ : /usr/local
+ uninstall_killall
+ dirname /usr/local/bin/rke2-uninstall.sh
+ _killall=/usr/local/bin/rke2-killall.sh
+ [ -e /usr/local/bin/rke2-killall.sh ]
+ eval /usr/local/bin/rke2-killall.sh
+ /usr/local/bin/rke2-killall.sh
+ systemctl stop rke2-server.service
+ systemctl stop rke2-agent.service
+ killtree
+ kill -9
+ do_unmount_and_remove /run/k3s
+ do_unmount_and_remove /var/lib/rancher/rke2
+ do_unmount_and_remove /var/lib/kubelet/pods
+ do_unmount_and_remove /run/netns/cni-
+ ip link show
+ grep master cni0
+ read ignore iface ignore
+ ip link delete cni0
Cannot find device "cni0"
+ ip link delete flannel.1
Cannot find device "flannel.1"
+ ip link delete flannel.4096
Cannot find device "flannel.4096"
+ ip link delete flannel-v6.1
Cannot find device "flannel-v6.1"
+ ip link delete flannel-wg
Cannot find device "flannel-wg"
+ ip link delete flannel-wg-v6
Cannot find device "flannel-wg-v6"
+ ip link delete vxlan.calico
Cannot find device "vxlan.calico"
+ ip link delete vxlan-v6.calico
Cannot find device "vxlan-v6.calico"
+ ip link delete cilium_vxlan
Cannot find device "cilium_vxlan"
+ ip link delete cilium_net
Cannot find device "cilium_net"
+ ip link delete cilium_wg0
Cannot find device "cilium_wg0"
+ ip link delete kube-ipvs0
Cannot find device "kube-ipvs0"
+ [ -d /sys/class/net/nodelocaldns ]
+ rm -rf /var/lib/cni/ /var/log/pods/ /var/log/containers
+ POD_MANIFESTS_DIR=/var/lib/rancher/rke2/agent/pod-manifests
+ rm -f /var/lib/rancher/rke2/agent/pod-manifests/etcd.yaml /var/lib/rancher/rke2/agent/pod-manifests/kube-apiserver.yaml /var/lib/rancher/rke2/agent/pod-manifests/kube-controller-manager.yaml /var/lib/rancher/rke2/agent/pod-manifests/cloud-controller-manager.yaml /var/lib/rancher/rke2/agent/pod-manifests/kube-scheduler.yaml /var/lib/rancher/rke2/agent/pod-manifests/kube-proxy.yaml
+ iptables-save
/usr/local/bin/rke2-killall.sh: 107: iptables-save: not found
+ grep -v KUBE-
+ grep -v CNI-
+ grep -v+ grep -v cali-
+ grep -v CILIUM_
 cali:
+ grep -v flannel
+ iptables-restore
/usr/local/bin/rke2-killall.sh: 107: iptables-restore: not found
+ ip6tables-save
/usr/local/bin/rke2-killall.sh: 108: + ip6tables-save: not foundgrep -v
 KUBE-
+ grep -v cali-
+ grep -v CNI-
+ grep -v CILIUM_
+ ip6tables-restore
/usr/local/bin/rke2-killall.sh: 108: ip6tables-restore: not found
+ grep -v flannel
+ grep -v cali:
+ set +x
If this cluster was upgraded from an older release of the Canal CNI, you may need to manually remove some flannel iptables rules:
-e      export cluster_cidr=YOUR-CLUSTER-CIDR
-e      iptables -D POSTROUTING -s $cluster_cidr -j MASQUERADE --random-fully
-e      iptables -D POSTROUTING ! -s $cluster_cidr -d  -j MASQUERADE --random-fully
+ trap uninstall_remove_self EXIT
+ uninstall_disable_services
+ command -v systemctl
+ systemctl disable rke2-server
Removed "/etc/systemd/system/multi-user.target.wants/rke2-server.service".
+ systemctl disable rke2-agent
+ systemctl reset-failed rke2-server
+ systemctl reset-failed rke2-agent
Failed to reset failed state of unit rke2-agent.service: Unit rke2-agent.service not loaded.
+ true
+ systemctl daemon-reload
+ uninstall_remove_files
+ [ -r /etc/redhat-release ]
+ [ -r /etc/centos-release ]
+ [ -r /etc/oracle-release ]
+ [  = suse ]
+ find /usr/local/lib/systemd/system -name rke2-*.service -type f -delete
+ find /usr/local/lib/systemd/system -name rke2-*.env -type f -delete
+ find /etc/systemd/system -name rke2-*.service -type f -delete
+ rm -f /usr/local/bin/rke2
+ rm -f /usr/local/bin/rke2-killall.sh
+ rm -rf /usr/local/share/rke2
+ rm -rf /etc/rancher/rke2
+ rm -rf /etc/rancher/node
+ rm -d /etc/rancher
+ rm -rf /etc/cni
+ rm -rf /opt/cni/bin
+ rm -rf /var/lib/kubelet
+ rm -rf /var/lib/rancher/rke2
+ rm -d /var/lib/rancher
+ type fapolicyd
+ uninstall_remove_policy
+ semodule -r rke2
/usr/local/bin/rke2-uninstall.sh: 119: semodule: not found
+ true
+ uninstall_remove_self
+ rm -f /usr/local/bin/rke2-uninstall.sh

rancher / rke2

Quick start fails with download docker image #2443