kubernetes-sigs / azuredisk-csi-driver

Azure Disk CSI Driver
Apache License 2.0
147 stars 193 forks source link

Manual Volume detach case not handled #2537

Open CoreyCook8 opened 2 months ago

CoreyCook8 commented 2 months ago

What happened:

After a volume was manually detached from a VM, two pods were mistakenly using the same Volume as their mounted volume.

What you expected to happen:

In an AWS cluster, this same issue gives this error message:

  Warning  FailedMount  14s (x6 over 30s)  kubelet            MountVolume.MountDevice failed for volume "pvc-XXXXX" : rpc error: code = Internal desc = Failed to find device path /dev/xvdaa. refusing to mount /dev/nvme3n1 because it claims to be volX but should be volY

I would expect this to be handled in a similar manner.

How to reproduce it:

  1. Create a pod that mounts a PVC.
  2. After the pod is running, manually detach the disk using the azure portal. (The pod will still show as running)
  3. Create another pod that mounts a PVC and assign it to the same node.
  4. Both pods should be running at this point.
  5. Delete & recreate the first pod
  6. The pod should go into a running state without the volume attaching
  7. At this point, they will both be using the same Volume
  8. To verify you can exec into both pods, create a file in the mounted directory in one and verify that it's shown in the other pod

Anything else we need to know?:

Environment:

andyzhangx commented 1 month ago

this is expected since on Azure VM, device name is not bound to disk name, e.g. disk1 is mounted as /dev/sdc, and when disk1 is manually detached and disk2 is attached to the VM, disk2 is mounted as /dev/sdc, if you delete & recreate the first pod with disk1 volume, then disk1 would still use /dev/sdc since at that time CSI driver thinks that disk1 is still attached to the VM, it would just reuse the previous device name(de/sdc).

BTW, manual volume detach is not supported CSI driver scenario, that's out of CSI driver control.

CoreyCook8 commented 1 month ago

I understand that manual detach is out of the control of the csi driver. But, I would expect the CSI driver to ensure that a new pod is using the volume it has requested and not another pod's volume. If the pod is deleted, and the new pod is attached to the same VM, I would expect the csi driver to check the drive, and make sure the expected volume == the actual volume.

Or, when attaching the second disk to the same drive as the first disk, it would realize that a disk should already be there / realize that the first disk is no longer attached.

andyzhangx commented 1 month ago

due to the manual detach, the kubelet thinks that the disk1 is already attached to the node, thus CSI driver won't be called (no NodeStageVolume call) to verify the drive.

When attaching disk2 to the VM, using the same device name(/dev/sdc) is actually ok (this is also out of CSI driver control, it's controlled by linux disk kernel driver), I think the main problem is that when you do the manual detach, you should reschedule the first pod to other node, that would work. Otherwise we don't have a solution how to make this work since it's out of CSI driver control