Open ryanhockstad opened 7 months ago
Bumping this. Any info would be very helpful. Seeing this a lot.
In my case, Pods are not initializing due to FailedMount event. When connected to the Node and checked the /var/lib/plugins_registry/ , did not have the "efs.csi.aws.com-reg.sock" file. Checked the logs of "CSI Driver Registrar" the logs look normal. Also brief about the EKS cluster, I have 1 worker node and 2nd node is created dynamically and efs-csi-node daemonset does the required setup.
Also if all the workloads on the static worker node are removed and then a new node is created dynamically, then the file "efs.csi.aws.com-reg.sock" is created properly and volume is mounted successfully.
The same setup I have it in a different cluster which is working pretty fine.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/kind bug
What happened? When deploying the aws-efs-csi-driver helm chart, as the efs-csi-node daemonset spins up, certain pods get stuck in a CrashLoopBackOff state. The logs for the efs-plugin container look normal:
But the logs for the csi-driver-registrar container just show
/usr/bin/csi-node-driver-registrar: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory
Likewise, the logs for the liveness-probe are just:
/usr/bin/livenessprobe: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory
Looking at the nodes the failing pods are running on, I've discovered that they do not have the
/var/lib/kubelet/plugins_registry/efs.csi.aws.com-reg.sock
file.The pods in the daemonset that do spin up properly do have this file. I'm unsure why this file is missing on some nodes, and I don't know how to configure the helm chart to ensure that this file gets created.
What you expected to happen? I expect all of the pods in efs-csi-node daemonset to spin up properly.
How to reproduce it (as minimally and precisely as possible)? This is unpredictable. I can fix the issue by destroying a node, and when a new node spins up, the
/var/lib/kubelet/plugins_registry/efs.csi.aws.com-reg.sock
file exists and the pods work as expected.Anything else we need to know?:
Environment
kubectl version
): v1.27.7-eks-4f4795d