openebs / lvm-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend LVM2 data storage stack.
Apache License 2.0
235 stars 92 forks source link

LVM pvc infinitely unmounting #303

Open tarelda opened 1 month ago

tarelda commented 1 month ago

What steps did you take and what happened: I have created simple registry deployment with one claim. Unfortunately for some reason is not getting mounted, but after I deleted deployment and pvc kubelet logs still shows that it is trying to unmount it. Also pods from deployment had to be manually deleted, because they were stuck in terminating state. Probably because they wrote to mountpoint (these was in logs before, but I manually cleaned up mountpoint directory).

May 11 10:16:46 hetz0 microk8s.daemon-kubelite[1295]: I0511 10:16:46.824372    1295 reconciler_common.go:161] "operationExecutor.UnmountVolume started for volume \"pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65\" (UniqueName: \"kubernetes.io/csi/local.csi.openebs.io^pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65\") pod \"df79bf0a-3ebf-4732-9fe4-9d020a283c67\" (UID: \"df79bf0a-3ebf-4732-9fe4-9d020a283c67\") "
May 11 10:16:46 hetz0 microk8s.daemon-kubelite[1295]: E0511 10:16:46.828678    1295 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/local.csi.openebs.io^pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65 podName:df79bf0a-3ebf-4732-9fe4-9d020a283c67 nodeName:}" failed. No retries permitted until 2024-05-11 10:18:48.828659569 +0000 UTC m=+7489.885207719 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65" (UniqueName: "kubernetes.io/csi/local.csi.openebs.io^pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65") pod "df79bf0a-3ebf-4732-9fe4-9d020a283c67" (UID: "df79bf0a-3ebf-4732-9fe4-9d020a283c67") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = not able to get the LVMVolume pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65 err : lvmvolumes.local.openebs.io "pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65" not found
May 11 10:16:46 hetz0 microk8s.daemon-kubelite[1295]: I0511 10:16:46.924796    1295 reconciler_common.go:161] "operationExecutor.UnmountVolume started for volume \"pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627\" (UniqueName: \"kubernetes.io/csi/local.csi.openebs.io^pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627\") pod \"eff435e1-5335-4def-a677-2b89cd5f6b65\" (UID: \"eff435e1-5335-4def-a677-2b89cd5f6b65\") "
May 11 10:16:46 hetz0 microk8s.daemon-kubelite[1295]: I0511 10:16:46.924862    1295 reconciler_common.go:161] "operationExecutor.UnmountVolume started for volume \"pvc-031e4187-5410-4044-8ec1-ae313cf47329\" (UniqueName: \"kubernetes.io/csi/local.csi.openebs.io^pvc-031e4187-5410-4044-8ec1-ae313cf47329\") pod \"f746b1c2-babc-49c2-8aeb-177e3d58f61c\" (UID: \"f746b1c2-babc-49c2-8aeb-177e3d58f61c\") "
May 11 10:16:46 hetz0 microk8s.daemon-kubelite[1295]: E0511 10:16:46.927559    1295 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/local.csi.openebs.io^pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627 podName:eff435e1-5335-4def-a677-2b89cd5f6b65 nodeName:}" failed. No retries permitted until 2024-05-11 10:18:48.92753728 +0000 UTC m=+7489.984085430 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627" (UniqueName: "kubernetes.io/csi/local.csi.openebs.io^pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627") pod "eff435e1-5335-4def-a677-2b89cd5f6b65" (UID: "eff435e1-5335-4def-a677-2b89cd5f6b65") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = not able to get the LVMVolume pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627 err : lvmvolumes.local.openebs.io "pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627" not found
May 11 10:16:46 hetz0 microk8s.daemon-kubelite[1295]: E0511 10:16:46.927631    1295 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/local.csi.openebs.io^pvc-031e4187-5410-4044-8ec1-ae313cf47329 podName:f746b1c2-babc-49c2-8aeb-177e3d58f61c nodeName:}" failed. No retries permitted until 2024-05-11 10:18:48.927613708 +0000 UTC m=+7489.984161858 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "pvc-031e4187-5410-4044-8ec1-ae313cf47329" (UniqueName: "kubernetes.io/csi/local.csi.openebs.io^pvc-031e4187-5410-4044-8ec1-ae313cf47329") pod "f746b1c2-babc-49c2-8aeb-177e3d58f61c" (UID: "f746b1c2-babc-49c2-8aeb-177e3d58f61c") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = not able to get the LVMVolume pvc-031e4187-5410-4044-8ec1-ae313cf47329 err : lvmvolumes.local.openebs.io "pvc-031e4187-5410-4044-8ec1-ae313cf47329" not found

What did you expect to happen: I expected to have clean environment to start over again. I don't know why it is still trying to unmount nonexistent volumes.

The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other Pastebin is fine.)

I0511 08:15:42.812351       1 main.go:149] LVM Driver Version :- 1.5.0 - commit :- d8b2b9aad93ad4c0447b7de385724e76be23d269
I0511 08:15:42.812390       1 main.go:150] DriverName: local.csi.openebs.io Plugin: controller EndPoint: unix:///var/lib/csi/sockets/pluginproxy/csi.sock NodeID:  SetIOLimits: false ContainerRuntime: containerd RIopsPerGB: [] WIopsPerGB: [] RBpsPerGB: [] WBpsPerGB: []
I0511 08:15:42.812733       1 driver.go:49] enabling volume access mode: SINGLE_NODE_WRITER
I0511 08:15:42.816993       1 controller.go:234] waiting for k8s & lvm node informer caches to be synced
I0511 08:15:42.917122       1 controller.go:238] synced k8s & lvm node informer caches
I0511 08:15:42.917141       1 controller.go:240] initializing csi provisioning leak protection controller
I0511 08:15:42.932214       1 leak_protection.go:134] "starting up csi pvc controller"
I0511 08:15:42.932255       1 shared_informer.go:311] Waiting for caches to sync for CSI Provisioner
I0511 08:15:42.932269       1 shared_informer.go:318] Caches are synced for CSI Provisioner
I0511 08:15:42.932372       1 grpc.go:190] Listening for connections on address: &net.UnixAddr{Name:"//var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"}
I0511 08:15:43.609794       1 grpc.go:72] GRPC call: /csi.v1.Identity/Probe requests {}
I0511 08:15:43.610907       1 grpc.go:81] GRPC response: {}
I0511 08:15:43.611460       1 grpc.go:72] GRPC call: /csi.v1.Identity/GetPluginInfo requests {}
I0511 08:15:43.611491       1 grpc.go:81] GRPC response: {"name":"local.csi.openebs.io","vendor_version":"1.5.0"}
I0511 08:15:43.611940       1 grpc.go:72] GRPC call: /csi.v1.Identity/GetPluginCapabilities requests {}
I0511 08:15:43.611978       1 grpc.go:81] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0511 08:15:43.612654       1 grpc.go:72] GRPC call: /csi.v1.Controller/ControllerGetCapabilities requests {}
I0511 08:15:43.612705       1 grpc.go:81] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":4}}}]}
I0511 08:15:43.613277       1 grpc.go:72] GRPC call: /csi.v1.Controller/ControllerGetCapabilities requests {}
I0511 08:15:43.613302       1 grpc.go:81] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":4}}}]}
I0511 08:15:43.662839       1 grpc.go:72] GRPC call: /csi.v1.Identity/GetPluginInfo requests {}
I0511 08:15:43.662884       1 grpc.go:81] GRPC response: {"name":"local.csi.openebs.io","vendor_version":"1.5.0"}
I0511 08:15:43.663350       1 grpc.go:72] GRPC call: /csi.v1.Identity/Probe requests {}
I0511 08:15:43.663384       1 grpc.go:81] GRPC response: {}
I0511 08:15:43.663595       1 grpc.go:72] GRPC call: /csi.v1.Controller/ControllerGetCapabilities requests {}
I0511 08:15:43.663618       1 grpc.go:81] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":4}}}]}
I0511 08:15:43.801982       1 grpc.go:72] GRPC call: /csi.v1.Identity/Probe requests {}
I0511 08:15:43.802018       1 grpc.go:81] GRPC response: {}
I0511 08:15:43.802384       1 grpc.go:72] GRPC call: /csi.v1.Identity/GetPluginInfo requests {}
I0511 08:15:43.802426       1 grpc.go:81] GRPC response: {"name":"local.csi.openebs.io","vendor_version":"1.5.0"}
I0511 08:15:43.802734       1 grpc.go:72] GRPC call: /csi.v1.Identity/GetPluginCapabilities requests {}
I0511 08:15:43.802756       1 grpc.go:81] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0511 08:15:43.803207       1 grpc.go:72] GRPC call: /csi.v1.Controller/ControllerGetCapabilities requests {}
I0511 08:15:43.803250       1 grpc.go:81] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":4}}}]}
I0511 08:19:51.234266       1 grpc.go:72] GRPC call: /csi.v1.Controller/DeleteVolume requests {"volume_id":"pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627"}
I0511 08:19:51.237036       1 controller.go:412] received request to delete volume "pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627"
I0511 08:19:51.247299       1 volume.go:104] deprovisioned volume pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627
I0511 08:19:52.251595       1 grpc.go:81] GRPC response: {}
I0511 10:22:52.896831       1 grpc.go:72] GRPC call: /csi.v1.Node/NodeUnpublishVolume requests {"target_path":"/var/snap/microk8s/common/var/lib/kubelet/pods/df79bf0a-3ebf-4732-9fe4-9d020a283c67/volumes/kubernetes.io~csi/pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65/mount","volume_id":"pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65"}
E0511 10:22:52.899634       1 grpc.go:79] GRPC error: rpc error: code = Internal desc = not able to get the LVMVolume pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65 err : lvmvolumes.local.openebs.io "pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65" not found
I0511 10:22:52.997718       1 grpc.go:72] GRPC call: /csi.v1.Node/NodeUnpublishVolume requests {"target_path":"/var/snap/microk8s/common/var/lib/kubelet/pods/eff435e1-5335-4def-a677-2b89cd5f6b65/volumes/kubernetes.io~csi/pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627/mount","volume_id":"pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627"}
I0511 10:22:52.997749       1 grpc.go:72] GRPC call: /csi.v1.Node/NodeUnpublishVolume requests {"target_path":"/var/snap/microk8s/common/var/lib/kubelet/pods/f746b1c2-babc-49c2-8aeb-177e3d58f61c/volumes/kubernetes.io~csi/pvc-031e4187-5410-4044-8ec1-ae313cf47329/mount","volume_id":"pvc-031e4187-5410-4044-8ec1-ae313cf47329"}
E0511 10:22:52.999953       1 grpc.go:79] GRPC error: rpc error: code = Internal desc = not able to get the LVMVolume pvc-031e4187-5410-4044-8ec1-ae313cf47329 err : lvmvolumes.local.openebs.io "pvc-031e4187-5410-4044-8ec1-ae313cf47329" not found
E0511 10:22:53.000025       1 grpc.go:79] GRPC error: rpc error: code = Internal desc = not able to get the LVMVolume pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627 err : lvmvolumes.local.openebs.io "pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627" not found
I0511 10:23:52.861476       1 lvmnode.go:306] Successfully synced 'openebs/hetz0'

I included only repeating part here, but full log is here .

NAME                                              READY   STATUS    RESTARTS       AGE
openebs-localpv-provisioner-7cd9f85f8f-brjjf      1/1     Running   1 (130m ago)   16h
openebs-lvm-localpv-controller-64946b785c-45kcg   5/5     Running   5 (130m ago)   16h
openebs-lvm-localpv-node-cgzs2                    2/2     Running   2 (130m ago)   16h
openebs-zfs-localpv-controller-7fdcd7f65-fptsd    5/5     Running   5 (130m ago)   16h
openebs-zfs-localpv-node-skckx                    2/2     Running   3 (130m ago)   16h
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""

Anything else you would like to add: I installed openebs directly through helm to get version 4.0.1 instead of microk8s default 3.10 that is installed through addon.

Environment:

abhilashshetty04 commented 1 month ago

Hi @tarelda , I do see 1 lvmvolume being present when the lvm node plugin started. Here is the add event for existing/already peovisioned volume. Unpublish was being received for the other lvmvolume also for which similar add events were not there. Meaning those 3 lvmvolume CR itself was not there.

I0511 08:15:52.611443 1 volume.go:55] Getting lvmvol object name:pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627, ns:openebs from cache

But we see delete event for the same volume later in the log.

I0511 08:19:51.248012 1 volume.go:103] Got update event for deleted Vol pvc-9ad5e4fb-5dc0-4c94-80e9-7f25a3c57627, Deletion timestamp 2024-05-11 08:19:51 +0000 UTC

Few questions:

  1. You mentioned pods from deployment were in Terminating state. Which deployment do you mean? Were they referencing to the volumes for which we are getting Unmount calls?
  2. Can you send us output for kubectl get pvc -oyaml and kubectl get pv -oyaml of lvm specific Pvc/PV.
  3. Were there any lvmvolume CR manually deleted? Im curious about pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65 and pvc-031e4187-5410-4044-8ec1-ae313cf47329
tarelda commented 1 month ago
  1. You mentioned pods from deployment were in Terminating state. Which deployment do you mean? Were they referencing to the volumes for which we are getting Unmount calls?
  2. Can you send us output for kubectl get pvc -oyaml and kubectl get pv -oyaml of lvm specific Pvc/PV.
  3. Were there any lvmvolume CR manually deleted? Im curious about pvc-243beed7-5ed6-439b-b7ac-fc60ac131f65 and pvc-031e4187-5410-4044-8ec1-ae313cf47329

I made deployment for an app that requested persistent volume through PVC with StorageClass handled by LVM plugin. Then I deleted it, because mounts weren't made in the pods. After that pods were stuck in Terminating state and volumes not deleted. Then I went to to town and deleted everything manually (including PVCs). I did it few times, hence I had multiple instances of volumes to be unmounted in the logs. As I recall this behaviour persisted even through OpenEBS redeployment with helm.

Small clarification - by manual delete of pvc I mean deleting through kubectl delete pvc and then deleting also mount dir for example /var/snap/microk8s/common/var/lib/kubelet/pods/f746b1c2-babc-49c2-8aeb-177e3d58f61c/volumes/kubernetes.io~csi/pvc-031e4187-5410-4044-8ec1-ae313cf47329. This was done obviously after deleting the pod f746b1c2-babc-49c2-8aeb-177e3d58f61c.

What is strange, few days later I finally figured out that when I was installing OpenEBS I haven't corrected kubelet dir paths in values.yml to match microK8S. Since that logs finally cleaned up and volumes started to be mounted correctly. But I don't understand why then paths in openebs-lvm-localpv-node pods log were for correct kubelet directory.

abhilashshetty04 commented 2 weeks ago

@tarelda , Happy to know that everythings fine now.

Without setting the correct kubelet mount path for microk8s path never got mounted on the pod.

Im guessing in the unmount workflow kubelet knows that its microk8s platform, So it supplies correct path in the NodeUnpublishVolumeRequest when the pod was stuck in Terminating state.

Question: NodeVolumePublish never succeded right? Do we have logs specific to this in the issue? Wanted to check target_path in the request.