Closed swamibluedata closed 7 months ago
Can mounter.IsMountPoint be leveraged and return gracefully for NodeUnpublishVolume
That sounds reasonable.
Relevant errors from kubelet
Jan 17 12:56:22 m2-lr1-dev-vm209146 kubelet[6550]: E0117 12:56:22.310038 6550 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/0e1b170d-ba55-4e36-80cf-e9ecba13c9ba-workload-socket podName:0e1b170d-ba55-4e36-80cf-e9ecba13c9ba nodeName:}" failed. No retries permitted until 2024-01-17 12:58:24.310019476 -0800 PST m=+37266.946744927 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "workload-socket" (UniqueName: "kubernetes.io/csi/0e1b170d-ba55-4e36-80cf-e9ecba13c9ba-workload-socket") pod "0e1b170d-ba55-4e36-80cf-e9ecba13c9ba" (UID: "0e1b170d-ba55-4e36-80cf-e9ecba13c9ba") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = unable to unmount "/var/lib/kubelet/pods/0e1b170d-ba55-4e36-80cf-e9ecba13c9ba/volumes/kubernetes.io~csi/workload-socket/mount": invalid argument
Jan 17 12:58:24 m2-lr1-dev-vm209146 kubelet[6550]: E0117 12:58:24.358434 6550 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/0e1b170d-ba55-4e36-80cf-e9ecba13c9ba-workload-socket podName:0e1b170d-ba55-4e36-80cf-e9ecba13c9ba nodeName:}" failed. No retries permitted until 2024-01-17 13:00:26.358417155 -0800 PST m=+37388.995142606 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "workload-socket" (UniqueName: "kubernetes.io/csi/0e1b170d-ba55-4e36-80cf-e9ecba13c9ba-workload-socket") pod "0e1b170d-ba55-4e36-80cf-e9ecba13c9ba" (UID: "0e1b170d-ba55-4e36-80cf-e9ecba13c9ba") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = unable to unmount "/var/lib/kubelet/pods/0e1b170d-ba55-4e36-80cf-e9ecba13c9ba/volumes/kubernetes.io~csi/workload-socket/mount": invalid argument
I can create a PR
Fixed by #161
When a node is rebooted, workloads that were using the spiffe csi driver gets into terminating state. Upon further investigation, kubelet is trying to issue unmount volume and that mount doesn't exist on the node. Since an error is returned to the caller, kubernetes it not able to schedule a new pod. The ony way to get out of this, is to force terminate the pod (--force and --grace-period=0)