Closed echo-devnull closed 2 years ago
It looks like etcd is failing to start. Can you attach the rke2-server
logs from journald, the etcd pod logs from/var/log/pods/kube-system_etcd-*/*
, and the output of CONTAINER_RUNTIME_SOCKET=/var/run/k3s/containerd/containerd.sock /var/lib/rancher/rke2/bin/crictl ps
?
Goodmorning! Thanks for reading and looking into this with me.
Output of the command:
root@excelsior /var/log/pods # CONTAINER_RUNTIME_SOCKET=/var/run/k3s/containerd/containerd.sock /var/lib/rancher/rke2/bin/crictl ps
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"
ERRO[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: no such file or directory"
ERRO[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/crio/crio.sock: connect: no such file or directory"
ERRO[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory"
FATA[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory"
Those pod logs do not seem to exist:
root@excelsior /var/log/pods # ls -ltrR kube-system_etcd-excelsior_*
kube-system_etcd-excelsior_f3a30bd4b4f10029e180c53180190a88:
total 4
drwxr-xr-x 2 root root 4096 Sep 11 08:08 etcd
kube-system_etcd-excelsior_f3a30bd4b4f10029e180c53180190a88/etcd:
total 0
kube-system_etcd-excelsior_add8d6fd3b8f7f02d9525a5bfd28943d:
total 4
drwxr-xr-x 2 root root 4096 Sep 11 11:59 etcd
kube-system_etcd-excelsior_add8d6fd3b8f7f02d9525a5bfd28943d/etcd:
total 0
And the logs are here: https://nextcloud.maas-martin.nl/s/QqzPgaWDoy2GQGH
Sorry, I was typing that command from memory - try CONTAINER_RUNTIME_ENDPOINT=unix:///var/run/k3s/containerd/containerd.sock /var/lib/rancher/rke2/bin/crictl ps
Can you also grab the logs at /var/lib/rancher/rke2/agent/containerd/containerd.log
and /var/lib/rancher/rke2/agent/logs/kubelet.log
?
I was wondering about that command ;-) Silly I did not realize what you where actually asking ;-)
root@excelsior ~ # CONTAINER_RUNTIME_ENDPOINT=unix:///var/run/k3s/containerd/containerd.sock /var/lib/rancher/rke2/bin/crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
788b20c6acc62 2d41fbfc20342 43 hours ago Running kube-proxy 9 1ae71c7500b80 kube-proxy-excelsior
And the requested logs: https://nextcloud.maas-martin.nl/s/dZ8GnZimoci8sM4
From kubelet.log: E0913 07:12:52.147100 169039 remote_runtime.go:421] "CreateContainer in sandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd container: get apparmor_parser version: exec: \"apparmor_parser\": executable file not found in $PATH" podSandboxID="c9b5003de37a3913f20fde2f738524fa6474fc8881e7dfffa6acf172bdee78e2"
This appears to be a duplicate of https://github.com/rancher/rke2/issues/1806 - you need to install the apparmor-parser package, which is required by newer releases of containerd when apparmor is enabled.
Goodmorning!
Welp, that indeed fixed the issue! Thank you so much! Did I miss that in the documentation? This was/is a clean debian 11 install and did not come with that by default.
Solved by:
sudo apt install apparmor
Thanks!
We did add it to the docs a while back: https://docs.rke2.io/install/quickstart/#prerequisites
While trying to upgrade from v1.23.9+rke2r1 to v1.24.3+rke2r1 I used the "https://docs.rke2.io/upgrade/automated_upgrade/" automated upgrade way of upgrading.
But after applying the plan, it got stuck with restarting the server: Logs: https://pastebin.com/raw/Ankj2hdt
Environmental Info: RKE2 Version: from
v1.23.9+rke2r1
tov1.24.3+rke2r1
Node(s) CPU architecture, OS, and Version:
Linux excelsior 5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64 GNU/Linux
Cluster Configuration: Single node server
Describe the bug: Upgrading seems to fail
Expected behavior: Using the automated upgrade path I hoped the server would come back cleanly after restarting,
Actual behavior: After a rke2-server restart the service does not actually start up
My guess is that this is because of the single node nature of my setup. It is trying to reach other etcd nodes to attach itself to ?