Ran into this issue during OS upgrade. kuma-cni and several other PODs could not start and were hanging in the unknown mode after node reboot.
Containerd was spamming the following message (example for the cert-exporter):
Jun 13 11:21:29 myworker(MASKED) containerd[777]: time="2024-06-13T11:21:29.717025442Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:cert-exporter-sd6ln,Uid:61df9ed3-6c50-4bf7-80ea-035be14ef1a0,Namespace:monitoring,Attempt:1,} failed, error" error="failed to setup network for sandbox \"92042607aafde23c6fd44a7901d957df089b7ff2b2980dcd133cd38b2826be6f\": plugin type=\"kuma-cni\" name=\"kuma-cni\" failed (add): more than one process '/install-cni' running on a node, this should not happen"
Didn't find the install-cni process with ps ax | grep -i cni on the node.
Issue is very rare and happened only twice for about 80 nodes.
Issue is mitigated by removing kuma-cni from CNI config /etc/cni/net.d/10-kuberouter.conflist, just pasting there original kube-router conf. After kuma-cni POD starts it updates the CNI config with normal values.
My guess that != should be changed to > for the following line:
What happened?
Ran into this issue during OS upgrade.
kuma-cni
and several other PODs could not start and were hanging in theunknown
mode after node reboot. Containerd was spamming the following message (example for the cert-exporter):Didn't find the
install-cni
process withps ax | grep -i cni
on the node. Issue is very rare and happened only twice for about 80 nodes. Issue is mitigated by removing kuma-cni from CNI config/etc/cni/net.d/10-kuberouter.conflist
, just pasting there original kube-router conf. Afterkuma-cni
POD starts it updates the CNI config with normal values.My guess that
!=
should be changed to>
for the following line:https://github.com/kumahq/kuma/blob/413bddfb40f22f3c2f2a52deef78d13be3c6a3c0/app/cni/pkg/cni/main.go#L114-L116
Installation type: On prem, kubespray K8s: 1.29.5 OS: Ubuntu 22.04 Network plugin: kube-router 2.0.1