kubernetes-csi / csi-driver-nfs

This driver allows Kubernetes to access NFS server on Linux node.
Apache License 2.0
870 stars 252 forks source link

csi-nfs-controller pod fails #783

Open bitchecker opened 3 weeks ago

bitchecker commented 3 weeks ago

What happened: I'm using k8s cluster on AWS eks, and I'm using spot instances for node groups. I see that randomly and not on all clusters one pod that manage the CSI NFS controller goes in crashloopback and report these logs:

csi-snapshotter E1029 09:35:37.115611       1 leaderelection.go:340] Failed to update lock optimitically: Operation cannot be fulfilled on leases.coordination.k8s.io "external-snapshotter-leader-nfs-csi-k8s-io": the object has been modified; please apply your changes to the latest version and try again, falling back to slow path

If I delete the pod, all starts without any issue:

nfs Compiler: gc
nfs Driver Name: nfs.csi.k8s.io
nfs Driver Version: v4.9.0
nfs Git Commit: ""
nfs Go Version: go1.22.3
nfs Platform: linux/amd64

It seems that every time (or mostly) that an ec2 is retired and swapped with another one, csi-nfs-controller has some lock that can be solved only with a brutal pod delete.

What you expected to happen: No crashloopback status on a controller pod How to reproduce it: Try to deploy a cluster with spot instances and install nfs-csi-controller and see IF happens and WHEN. Anything else we need to know?:

Environment: