Cannot install rke2 in airgap using tarball method
Steps To Reproduce:
Create airgap VM
Copy tar file from release artifacts to VM
Copy rke2 binary from release artifacts to VM
Attempt to run rke2: sudo rke2 server --write-kubeconfig-mode 644 --debug
Expected behavior:
rke2 should be running successfully and the cluster should be up and running
Actual behavior:
rke2 fails to run, giving a fatal error (see below)
Additional context / logs:
Log taken from a node that was attempting to run with calico also, so had multiple tar files present, but saw the same behavior in the minimal reproduction steps shown above.
$ sudo rke2 server --write-kubeconfig-mode 644 --cni=calico --debug
WARN[0000] not running in CIS mode
INFO[0000] Starting rke2 v1.24.2-rc1+rke2r1 (c943ed52e7fb92fb2da27608bb5848e34a807ac9)
INFO[0000] Managed etcd cluster initializing
INFO[0000] Starting etcd for new cluster
INFO[0000] Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,rke2 --authorization-mode=Node,RBAC --bind-address=0.0.0.0 --cert-dir=/var/lib/rancher/rke2/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --egress-selector-config-file=/var/lib/rancher/rke2/server/etc/egress-selector-config.yaml --enable-admission-plugins=NodeRestriction,PodSecurityPolicy --enable-aggregator-routing=true --encryption-provider-config=/var/lib/rancher/rke2/server/cred/encryption-config.json --etcd-cafile=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --etcd-certfile=/var/lib/rancher/rke2/server/tls/etcd/client.crt --etcd-keyfile=/var/lib/rancher/rke2/server/tls/etcd/client.key --etcd-servers=https://127.0.0.1:2379 --feature-gates=JobTrackingWithFinalizers=true --kubelet-certificate-authority=/var/lib/rancher/rke2/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.key --profiling=false --proxy-client-cert-file=/var/lib/rancher/rke2/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/rke2/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/rke2/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/rancher/rke2/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/rke2/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/rke2/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/rke2/server/tls/serving-kube-apiserver.key
INFO[0000] Running kube-scheduler --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --profiling=false --secure-port=10259
INFO[0000] Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --cluster-signing-kube-apiserver-client-key-file=/var/lib/rancher/rke2/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --cluster-signing-kubelet-client-key-file=/var/lib/rancher/rke2/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/var/lib/rancher/rke2/server/tls/server-ca.crt --cluster-signing-kubelet-serving-key-file=/var/lib/rancher/rke2/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/var/lib/rancher/rke2/server/tls/server-ca.crt --cluster-signing-legacy-unknown-key-file=/var/lib/rancher/rke2/server/tls/server-ca.key --configure-cloud-routes=false --controllers=*,-service,-route,-cloud-node-lifecycle --feature-gates=JobTrackingWithFinalizers=true --kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --profiling=false --root-ca-file=/var/lib/rancher/rke2/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/var/lib/rancher/rke2/server/tls/service.key --use-service-account-credentials=true
INFO[0000] Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --bind-address=127.0.0.1 --cloud-provider=rke2 --cluster-cidr=10.42.0.0/16 --configure-cloud-routes=false --kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --node-status-update-frequency=1m0s --profiling=false
INFO[0000] Node token is available at /var/lib/rancher/rke2/server/token
W0629 19:34:26.669529 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
INFO[0000] To join node to cluster: rke2 agent -s https://172.31.13.29:9345 -t ${NODE_TOKEN}
INFO[0000] Tunnel server egress proxy mode: disabled
INFO[0000] Wrote kubeconfig /etc/rancher/rke2/rke2.yaml
INFO[0000] Run: rke2 kubectl
W0629 19:34:26.673214 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0629 19:34:26.673876 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/run/k3s/containerd/containerd.sock /run/k3s/containerd/containerd.sock <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory". Reconnecting...
INFO[0000] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory"
INFO[0000] Cluster-Http-Server 2022/06/29 19:34:26 http: TLS handshake error from 127.0.0.1:44422: remote error: tls: bad certificate
INFO[0000] Cluster-Http-Server 2022/06/29 19:34:26 http: TLS handshake error from 127.0.0.1:44428: remote error: tls: bad certificate
DEBU[0000] Password verified locally for node 'ip-172-31-13-29'
INFO[0000] certificate CN=ip-172-31-13-29 signed by CN=rke2-server-ca@1656527873: notBefore=2022-06-29 18:37:53 +0000 UTC notAfter=2023-06-29 19:34:26 +0000 UTC
INFO[0000] certificate CN=system:node:ip-172-31-13-29,O=system:nodes signed by CN=rke2-client-ca@1656527873: notBefore=2022-06-29 18:37:53 +0000 UTC notAfter=2023-06-29 19:34:26 +0000 UTC
INFO[0000] Module overlay was already loaded
INFO[0000] Module nf_conntrack was already loaded
INFO[0000] Module br_netfilter was already loaded
INFO[0000] Module iptable_nat was already loaded
DEBU[0000] getConntrackMax: using conntrack-min
INFO[0000] Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.24.2-rc1-rke2r1
W0629 19:34:27.675628 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0629 19:34:27.675886 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0629 19:34:27.736286 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/run/k3s/containerd/containerd.sock /run/k3s/containerd/containerd.sock <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory". Reconnecting...
INFO[0001] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory"
W0629 19:34:29.308837 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0629 19:34:29.500468 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0629 19:34:29.780059 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/run/k3s/containerd/containerd.sock /run/k3s/containerd/containerd.sock <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory". Reconnecting...
INFO[0003] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory"
INFO[0004] Cluster-Http-Server 2022/06/29 19:34:31 http: TLS handshake error from 172.31.11.21:55128: remote error: tls: bad certificate
ERRO[0005] runtime core not ready
W0629 19:34:31.854530 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0629 19:34:32.489901 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0629 19:34:34.123563 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/run/k3s/containerd/containerd.sock /run/k3s/containerd/containerd.sock <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory". Reconnecting...
INFO[0007] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory"
W0629 19:34:36.104690 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
INFO[0009] Cluster-Http-Server 2022/06/29 19:34:36 http: TLS handshake error from 172.31.11.21:55156: remote error: tls: bad certificate
ERRO[0010] runtime core not ready
W0629 19:34:36.502778 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
{"level":"warn","ts":"2022-06-29T19:34:36.669Z","logger":"etcd-client","caller":"v3@v3.5.4-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b52a80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[0010] Failed to test data store connection: context deadline exceeded
INFO[0014] Cluster-Http-Server 2022/06/29 19:34:41 http: TLS handshake error from 172.31.11.21:55180: remote error: tls: bad certificate
ERRO[0015] runtime core not ready
W0629 19:34:41.673055 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0629 19:34:42.288397 4959 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/run/k3s/containerd/containerd.sock /run/k3s/containerd/containerd.sock <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory". Reconnecting...
INFO[0015] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory"
FATA[0015] failed to setup cri connection: timed out waiting for the condition
$ sudo ls /var/lib/rancher/rke2/agent/images/
rke2-airgap-images-calico.tar.gz rke2-airgap-images.tar.gz
This is a backport issue for https://github.com/rancher/rke2/issues/3113, automatically created via rancherbot by @rancher-max
Original issue description:
Environmental Info: RKE2 Version:
v1.24.2-rc1+rke2r1
Node(s) CPU architecture, OS, and Version:
ubuntu 20.04 LTS
Cluster Configuration:
1 server 1 agent
Describe the bug:
Cannot install rke2 in airgap using tarball method
Steps To Reproduce:
sudo rke2 server --write-kubeconfig-mode 644 --debug
Expected behavior:
rke2 should be running successfully and the cluster should be up and running
Actual behavior:
rke2 fails to run, giving a fatal error (see below)
Additional context / logs:
Log taken from a node that was attempting to run with calico also, so had multiple tar files present, but saw the same behavior in the minimal reproduction steps shown above.