Open w9n opened 6 years ago
ctr t delete kubelet && ctr t start kubelet
starts successfully. The risk that kubelet starts before var/run/cri-containerd.sock
is there seems much higher without kubeadm after first boot.
Does this still happen with Kube 1.9 (just merged via #33)? I think I saw some improvements in this area.
If not then please open an issue against kubernetes or cri-containerd (whichever seems more appropriate, and assuming there isn't already one) since they should be more robust at startup time. My guy feeling is that it is kubernetes which should be more robust to waiting for CRI than vice versa, so a kube issue would seem the way to go.
I just built and booted current master (d39e6ba85fa46a19e77668e436c9e02da8c03850), booted and ran kubeadm-init.sh
waited for the pods to all come up then powered off with poweroff
. I then booted again with the same persistent disk and again waited for the pods (they did).
I repeated the poweroff
, boot and check 5 times and was successful each time (compared with your previous <10% success rate). I also did one iteration with poweroff -f
and one with reboot
for good measure, still no failures.
So either this is fixed in 1.9.0 or the odds of hitting the problem have changed dramatically, or perhaps something is different in your environment (perhaps just timings).
I tried restarting the master 3 times with no problems but could reproduce it after setting up some nodes and pods :/. From what I know Systemd usually handles Socket Activation.
LinuxKit doesn't use systemd.
As said in https://github.com/linuxkit/kubernetes/issues/26#issuecomment-352443648 I think this is an upstream issue.
Sure, but the upstream implementations use systemd which does socket activation. I will check if i can solve it with manually waiting until the socket is there.
Description
Steps to reproduce the issue: start master, wait until all pods are started, poweroff and restart. In ~<10% it reboots successfully
Describe the results you received: kubelet log
cri-containerd
Describe the results you expected: running kubelet
Additional information you deem important (e.g. issue happens only occasionally): I could reproduce this back to at least
f9a2a31