yandex-cloud / k8s-csi-s3

GeeseFS-based CSI for mounting S3 buckets as PersistentVolumes
Other
540 stars 95 forks source link

[Bug] A new pod attaches to the dying systemd unit #134

Closed sm4ll-3gg closed 1 month ago

sm4ll-3gg commented 1 month ago

Hi! There's a bug handling systemd units which breaks storage access with the following error: "Transport endpoint is not connected".

Steps to reproduce

Working with two pods using the same PVC on the same Node:

  1. Create first pod and wait unit it's available
  2. Delete created pod
  3. Create second pod before the systemd unit (which was created for the first pod) is terminated 4.The pod attaches to the same unit and fails to access storage when it's finally stopped

Root cause

NodeUnstageVolume initiates the unmount by stopping the systemd unit and returns without waiting for the operation to complete. So when a new pod is started, NodeStageVolume and NodePublishVolume check that the stage mount point already exists (but it's under unmouning at the same time) and attach to it. After a while, the systemd unit dies, and the new pod loses access to the FS.

We often see the behavior because systemd unit is stuck on termination (for some reason) until timeout and is being killed.

Possible solutions

Wait until mount point is unmounted before NodeUnstageVolume exits:

resCh := make(chan string)
_, err = conn.StopUnit(unitName, "replace", resCh)
if err != nil {
    return false, err
}

res := <-resCh // wait until unit is stopped
glog.Infof("Unit is stopped with result: %s")

Do unmount explicitly after stopping systemd unit: Replace this code fragment ExecStopPost for systemd unit (https://github.com/yandex-cloud/k8s-csi-s3/blob/80dcfa4c021f726d67fca3e4df78e2a49a5c3e6b/pkg/mounter/geesefs.go#L203-L216) with explicit Unmount call:

_, err = conn.StopUnit(unitName, "replace", nil)
if err != nil {
    return false, err
}

err = Unmount(stagingTargetPath)
if err != nil {
    glog.Errorf("Failed to unmount staging target %v: %v", unitName, err)
    return false, err
}

Not sure which one is better, so let's discuss)