vexxhost / magnum-cluster-api

Cluster API driver for OpenStack Magnum
Apache License 2.0
41 stars 17 forks source link

etcd data dir not empty #330

Closed robincron closed 3 months ago

robincron commented 3 months ago

i don't exactly know why or how, but it seems that the etcd data dir is not empty when a control plane node is built using etcd_volume_size=10 and etcd_volume_type=encrypted-volumes in our deployment. For example, here is a control plane node with the lost+found folder inside /var/lib/etcd: image

It looks (to me) that there is some sort of timing problem, where Cloud-Init fails due to the folder not being empty before the prekubeadmcommands even run?

Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 audit: BPF prog-id=20 op=UNLOAD Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:23] Cloud-init v. 23.1.2-0ubuntu0~22.04.1 running 'modules:final' at Tue, 19 Mar 2024 11:22:22 +0000. Up 21.20 sec> Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] [init] Using Kubernetes version: v1.27.3 Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] [preflight] Running pre-flight checks Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] error execution phase preflight: [preflight] Some fatal errors occurred: Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] [preflight] If you know what you are doing, you can make a check non-fatal with--ignore-preflight-errors=... Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] To see the stack trace of this error execute with --v=5 or higher Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] 2024-03-19 11:22:24,453 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/> Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] 2024-03-19 11:22:24,454 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_> Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 cloud-init[901]: [2024-03-19 11:22:24] Cloud-init v. 23.1.2-0ubuntu0~22.04.1 finished at Tue, 19 Mar 2024 11:22:24 +0000. Datasource DataSourceOpenSt> Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 systemd[1]: dmesg.service: Deactivated successfully. Mar 19 11:22:24 kube-bxbqe-xx6sh-cndp2 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=dmesg comm="systemd" exe="/usr/lib/systemd/systemd" hostn> Mar 19 11:22:27 kube-bxbqe-xx6sh-cndp2 chronyd[818]: Selected source 158.101.188.125 (2.ubuntu.pool.ntp.org) Mar 19 11:22:27 kube-bxbqe-xx6sh-cndp2 chronyd[818]: System clock wrong by -278.022585 seconds Mar 19 11:22:00 kube-bxbqe-xx6sh-cndp2 kubelet[1144]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox im> Mar 19 11:22:00 kube-bxbqe-xx6sh-cndp2 kubelet[1144]: I0319 11:22:00.003195 1144 server.go:199] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet> Mar 19 11:22:00 kube-bxbqe-xx6sh-cndp2 kubelet[1144]: E0319 11:22:00.003494 1144 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet confi> Mar 19 11:22:00 kube-bxbqe-xx6sh-cndp2 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE Mar 19 11:22:00 kube-bxbqe-xx6sh-cndp2 systemd[1]: kubelet.service: Failed with result 'exit-code'.

We are running: image

Originally posted by @robincron in https://github.com/vexxhost/magnum-cluster-api/issues/305#issuecomment-2007090355