hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
15k stars 1.96k forks source link

CSI ControllerUnpublishVolume called twice #24130

Open ygersie opened 1 month ago

ygersie commented 1 month ago

Nomad version

1.8.2+ent

Problem description

When looking into an issue that still sometimes leads to stuck CSI volumes I ran into the following scenario. When I stop an allocation and it is rescheduled onto the same node I see events on 2 CSI controller plugins instead of just 1. It looks like the ControllerUnpublishVolume RPC is called a second time incorrectly. I'm not sure if this is ever going to be causing problems but it's at least somewhat unexpected.

Logs

client plugin

I1004 14:31:49.090273       1 node.go:461] "NodeUnpublishVolume: called" args={"volume_id":"vol-025ef963301234567","target_path":"/local/csi/per-alloc/f10806cb-1dce-6fd1-1700-f49c4869442d/ygersie-test[0]/rw-file-system-single-node-writer"}
I1004 14:31:49.090423       1 node.go:479] "NodeUnpublishVolume: unmounting" target="/local/csi/per-alloc/f10806cb-1dce-6fd1-1700-f49c4869442d/ygersie-test[0]/rw-file-system-single-node-writer"
I1004 14:31:49.091389       1 mount_helper_common.go:93] unmounting "/local/csi/per-alloc/f10806cb-1dce-6fd1-1700-f49c4869442d/ygersie-test[0]/rw-file-system-single-node-writer" (corruptedMount: false, mounterCanSkipMountPointChecks: true)
I1004 14:31:49.091411       1 mount_linux.go:362] Unmounting /local/csi/per-alloc/f10806cb-1dce-6fd1-1700-f49c4869442d/ygersie-test[0]/rw-file-system-single-node-writer
I1004 14:31:49.095315       1 mount_helper_common.go:150] Warning: deleting path "/local/csi/per-alloc/f10806cb-1dce-6fd1-1700-f49c4869442d/ygersie-test[0]/rw-file-system-single-node-writer"
I1004 14:31:49.095409       1 node.go:475] "NodeUnPublishVolume: volume operation finished" volumeId="vol-025ef963301234567"
I1004 14:31:49.095427       1 inflight.go:74] "Node Service: volume operation finished" key="vol-025ef963301234567"
I1004 14:31:49.096428       1 node.go:289] "NodeUnstageVolume: called" args={"volume_id":"vol-025ef963301234567","staging_target_path":"/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer"}
I1004 14:31:49.096694       1 node.go:329] "NodeUnstageVolume: unmounting" target="/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer"
I1004 14:31:49.096715       1 mount_helper_common.go:93] unmounting "/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer" (corruptedMount: false, mounterCanSkipMountPointChecks: true)
I1004 14:31:49.096726       1 mount_linux.go:362] Unmounting /local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer
I1004 14:31:49.102049       1 mount_helper_common.go:150] Warning: deleting path "/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer"
I1004 14:31:49.102506       1 node.go:334] "NodeUnStageVolume: successfully unstaged volume" volumeID="vol-025ef963301234567" target="/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer"
I1004 14:31:49.103043       1 node.go:304] "NodeUnStageVolume: volume operation finished" volumeID="vol-025ef963301234567"
I1004 14:31:49.103063       1 inflight.go:74] "Node Service: volume operation finished" key="vol-025ef963301234567"
I1004 14:32:09.950816       1 node.go:552] "NodeGetCapabilities: called" args={}
I1004 14:32:10.068667       1 node.go:113] "NodeStageVolume: called" args={"volume_id":"vol-025ef963301234567","publish_context":{"devicePath":"/dev/xvdaa"},"staging_target_path":"/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}}
I1004 14:32:10.068830       1 node_linux.go:82] "[Debug] Falling back to nvme volume ID lookup" devicePath="/dev/xvdaa"
I1004 14:32:10.068887       1 node_linux.go:95] "[Debug] successfully resolved" nvmeName="nvme-Amazon_Elastic_Block_Store_vol025ef963301234567" nvmeDevicePath="/dev/nvme1n1"
I1004 14:32:10.068912       1 node.go:207] "NodeStageVolume: find device path" devicePath="/dev/xvdaa" source="/dev/nvme1n1"
I1004 14:32:10.069209       1 node.go:235] "NodeStageVolume: checking if volume is already staged" device="" source="/dev/nvme1n1" target="/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer"
I1004 14:32:10.069237       1 node.go:242] "NodeStageVolume: staging volume" source="/dev/nvme1n1" volumeID="vol-025ef963301234567" target="/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer" fstype="ext4"
I1004 14:32:10.069251       1 mount_linux.go:579] Attempting to determine if disk "/dev/nvme1n1" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/nvme1n1])
I1004 14:32:10.093990       1 mount_linux.go:582] Output: "DEVNAME=/dev/nvme1n1\nTYPE=ext4\n"
I1004 14:32:10.094016       1 mount_linux.go:441] Checking for issues with fsck on disk: /dev/nvme1n1
I1004 14:32:10.122820       1 mount_linux.go:546] Attempting to mount disk /dev/nvme1n1 in ext4 format at /local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer
I1004 14:32:10.122851       1 mount_linux.go:220] Mounting cmd (mount) with arguments (-t ext4 -o defaults /dev/nvme1n1 /local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer)
I1004 14:32:10.137170       1 mount_linux.go:579] Attempting to determine if disk "/dev/nvme1n1" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/nvme1n1])
I1004 14:32:10.148963       1 mount_linux.go:582] Output: "DEVNAME=/dev/nvme1n1\nTYPE=ext4\n"
I1004 14:32:10.149041       1 resizefs_linux.go:139] ResizeFs.needResize - checking mounted volume /dev/nvme1n1
I1004 14:32:10.152965       1 resizefs_linux.go:143] Ext size: filesystem size=1073741824, block size=4096
I1004 14:32:10.152982       1 resizefs_linux.go:158] Volume /dev/nvme1n1: device size=1073741824, filesystem size=1073741824, block size=4096
I1004 14:32:10.153017       1 node.go:284] "NodeStageVolume: successfully staged volume" source="/dev/nvme1n1" volumeID="vol-025ef963301234567" target="/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer" fstype="ext4"
I1004 14:32:10.153036       1 node.go:184] "NodeStageVolume: volume operation finished" volumeID="vol-025ef963301234567"
I1004 14:32:10.153049       1 inflight.go:74] "Node Service: volume operation finished" key="vol-025ef963301234567"
I1004 14:32:10.154175       1 node.go:408] "NodePublishVolume: called" args={"volume_id":"vol-025ef963301234567","publish_context":{"devicePath":"/dev/xvdaa"},"staging_target_path":"/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer","target_path":"/local/csi/per-alloc/892e2aba-61fb-f2d3-1e29-a07ce47d715d/ygersie-test[0]/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}}
I1004 14:32:10.154677       1 node_linux.go:155] "NodePublishVolume: creating dir" target="/local/csi/per-alloc/892e2aba-61fb-f2d3-1e29-a07ce47d715d/ygersie-test[0]/rw-file-system-single-node-writer"
I1004 14:32:10.155798       1 node.go:744] "NodePublishVolume: mounting" source="/local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer" target="/local/csi/per-alloc/892e2aba-61fb-f2d3-1e29-a07ce47d715d/ygersie-test[0]/rw-file-system-single-node-writer" mountOptions=["bind"] fsType="ext4"
I1004 14:32:10.156953       1 mount_linux.go:220] Mounting cmd (mount) with arguments (-t ext4 -o bind /local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer /local/csi/per-alloc/892e2aba-61fb-f2d3-1e29-a07ce47d715d/ygersie-test[0]/rw-file-system-single-node-writer)
I1004 14:32:10.164868       1 mount_linux.go:220] Mounting cmd (mount) with arguments (-t ext4 -o bind,remount /local/csi/staging/cie-orch/ygersie-test[0]/rw-file-system-single-node-writer /local/csi/per-alloc/892e2aba-61fb-f2d3-1e29-a07ce47d715d/ygersie-test[0]/rw-file-system-single-node-writer)
I1004 14:32:10.169105       1 node.go:437] "NodePublishVolume: volume operation finished" volumeId="vol-025ef963301234567"
I1004 14:32:10.169168       1 inflight.go:74] "Node Service: volume operation finished" key="vol-025ef963301234567"

csi-ebs-controller plugin logs 1

I1004 14:31:49.128796       1 controller.go:431] "ControllerUnpublishVolume: called" args={"volume_id":"vol-025ef963301234567","node_id":"i-0b2cad6b68d0f7cb2"}
I1004 14:31:49.128815       1 controller.go:444] "ControllerUnpublishVolume: detaching" volumeID="vol-025ef963301234567" nodeID="i-0b2cad6b68d0f7cb2"
I1004 14:31:49.644706       1 cloud.go:746] "Waiting for volume state" volumeID="vol-025ef963301234567" actual="detaching" desired="detached"
I1004 14:31:50.747221       1 cloud.go:746] "Waiting for volume state" volumeID="vol-025ef963301234567" actual="detaching" desired="detached"
I1004 14:31:52.602655       1 cloud.go:746] "Waiting for volume state" volumeID="vol-025ef963301234567" actual="detaching" desired="detached"
I1004 14:31:55.945694       1 controller.go:452] "ControllerUnpublishVolume: detached" volumeID="vol-025ef963301234567" nodeID="i-0b2cad6b68d0f7cb2"
I1004 14:31:55.945720       1 inflight.go:74] "Node Service: volume operation finished" key="vol-025ef963301234567"
I1004 14:31:55.958235       1 controller.go:431] "ControllerUnpublishVolume: called" args={"volume_id":"vol-025ef963301234567","node_id":"i-0b2cad6b68d0f7cb2"}
I1004 14:31:55.958263       1 controller.go:444] "ControllerUnpublishVolume: detaching" volumeID="vol-025ef963301234567" nodeID="i-0b2cad6b68d0f7cb2"
I1004 14:31:56.035361       1 cloud.go:620] "DetachDisk: called on non-attached volume" volumeID="vol-025ef963301234567"
I1004 14:31:56.250962       1 controller.go:447] "ControllerUnpublishVolume: attachment not found" volumeID="vol-025ef963301234567" nodeID="i-0b2cad6b68d0f7cb2"
I1004 14:31:56.250989       1 inflight.go:74] "Node Service: volume operation finished" key="vol-025ef963301234567"

csi-ebs-controller plugin logs 2

I1004 14:32:03.171015       1 controller.go:379] "ControllerPublishVolume: called" args={"volume_id":"vol-025ef963301234567","node_id":"i-0b2cad6b68d0f7cb2","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}}
I1004 14:32:03.171346       1 controller.go:392] "ControllerPublishVolume: attaching" volumeID="vol-025ef963301234567" nodeID="i-0b2cad6b68d0f7cb2"
I1004 14:32:03.639316       1 cloud.go:574] "[Debug] AttachVolume" volumeID="vol-025ef963301234567" nodeID="i-0b2cad6b68d0f7cb2" resp=<
    {
      AttachTime: 2024-10-04 14:32:03.625 +0000 UTC,
      Device: "/dev/xvdaa",
      InstanceId: "i-0b2cad6b68d0f7cb2",
      State: "attaching",
      VolumeId: "vol-025ef963301234567"
    }
 >
I1004 14:32:03.766956       1 cloud.go:746] "Waiting for volume state" volumeID="vol-025ef963301234567" actual="attaching" desired="attached"
I1004 14:32:04.838842       1 cloud.go:746] "Waiting for volume state" volumeID="vol-025ef963301234567" actual="attaching" desired="attached"
I1004 14:32:05.966122       1 controller.go:470] "ControllerGetCapabilities: called" args={}
I1004 14:32:06.717816       1 cloud.go:746] "Waiting for volume state" volumeID="vol-025ef963301234567" actual="attaching" desired="attached"
I1004 14:32:10.065279       1 manager.go:189] "[Debug] Releasing in-process" attachment entry="/dev/xvdaa" volume="vol-025ef963301234567"
I1004 14:32:10.065307       1 controller.go:400] "ControllerPublishVolume: attached" volumeID="vol-025ef963301234567" nodeID="i-0b2cad6b68d0f7cb2" devicePath="/dev/xvdaa"
I1004 14:32:10.065317       1 inflight.go:74] "Node Service: volume operation finished" key="vol-025ef963301234567"

nomad client logs

    2024-10-04T14:31:49.069Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=f10806cb-1dce-6fd1-1700-f49c4869442d task=ygersie type=Killed msg="Task successfully killed" failed=false
    2024-10-04T14:31:49.079Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=f10806cb-1dce-6fd1-1700-f49c4869442d task=ygersie plugin=/usr/sbin/nomad id=1306
    2024-10-04T14:31:49.087Z [INFO]  client.gc: marking allocation for GC: alloc_id=f10806cb-1dce-6fd1-1700-f49c4869442d
    2024-10-04T14:31:49.101Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.219:4647
    2024-10-04T14:31:49.101Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.219:4647
    2024-10-04T14:31:51.112Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.181:4647
    2024-10-04T14:31:51.112Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.181:4647
    2024-10-04T14:31:55.121Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.140:4647
    2024-10-04T14:31:55.121Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.140:4647

If the allocation moves to a different node things I don't see the duplicate ControllerUnpublishVolume, however I do always see these on the node that is going to run the replacement allocation:

    2024-10-04T14:47:30.266Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=87fc7373-8c51-16ba-2bde-fa763370ea2d task=ygersie type=Received msg="Task received by client" failed=false
    2024-10-04T14:47:35.918Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.219:4647
    2024-10-04T14:47:35.918Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.219:4647
    2024-10-04T14:47:37.928Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.140:4647
    2024-10-04T14:47:37.928Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.140:4647
    2024-10-04T14:47:41.970Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.181:4647
    2024-10-04T14:47:41.970Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: rpc error: volume max claims reached" rpc=CSIVolume.Claim server=10.125.56.181:4647
    2024-10-04T14:47:51.828Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=87fc7373-8c51-16ba-2bde-fa763370ea2d task=ygersie type="Task Setup" msg="Building Task Directory" failed=false
    2024-10-04T14:47:51.890Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=87fc7373-8c51-16ba-2bde-fa763370ea2d task=ygersie type=Driver msg="Downloading image" failed=false
    2024-10-04T14:47:53.858Z [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=d935cc0940bb82b81e92facc2f8e9611427fc5d375afc14b8261bf1a63f1d95f
    2024-10-04T14:47:54.147Z [INFO]  client.driver_mgr.docker: started container: driver=docker container_id=d935cc0940bb82b81e92facc2f8e9611427fc5d375afc14b8261bf1a63f1d95f
    2024-10-04T14:47:54.183Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=87fc7373-8c51-16ba-2bde-fa763370ea2d task=ygersie type=Started msg="Task started by client" failed=false

I guess, although logged as an error, that's not really a problem as the client keeps retrying to gain a Claim but the volume wasn't completely Unpublished yet.

tgross commented 1 month ago

Hi @ygersie! Yeah that's not totally unexpected. The ControllerUnpublishVolume RPC is required to be idempotent by the specification, so this should be safe. I have some thoughts around using "sagas" to tighten up this behavior but so far all the draft designs I've considered end up pushing a lot more Raft logs, so there are tradeoffs there.

But I'll mark this for further examination in the meanwhile.

ygersie commented 1 month ago

@tgross thanks! I wasn't sure if this was an issue so wanted to post. If this is an idempotent call feel free to close this issue 👍