longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes
https://longhorn.io
Apache License 2.0
6.12k stars 600 forks source link

[BUG] MapVolume.SetUpDevice failed for volume ... no such device or address #9426

Open rajeesh-sdk opened 2 months ago

rajeesh-sdk commented 2 months ago

Describe the bug

While redeploying as Statefulset, the first Pod is unable to attach the volume with the following error message:

0s Warning FailedMapVolume pod/aerospike-0 MapVolume.SetUpDevice failed for volume "pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250" : rpc error: code = Internal desc = failed to create file /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250/pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250: open /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250/pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250: no such device or address

The Volumes are in attached stage and is not showing any errors

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

KubeAdm 1.27

Additional context

Workaround and Mitigation

PhanLe1010 commented 2 months ago

Is it a block volume mode PVC?

Can you scale down the statefuleset, restart kubelets, scale up the statefuleset to see if it change anything?

If still doesn't fix, can you reproduce the issue and send us the logs of the kubelet and longhorn support bundle to longhorn-support-bundle@suse.com?

phoenix-frozen commented 1 week ago

I... think I might be running into a similar problem?

Dynamically provisioned volume on block-mode storage using the v2 engine.

 MapVolume.MapPodDevice failed for volume "pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c" : rpc error: code = Internal desc = failed to bind mount "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c" at "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/284dfa73-7bc7-4833-ac96-43275795e289": mount failed: exit status 32 Mounting command: mount Mounting arguments: -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/284dfa73-7bc7-4833-ac96-43275795e289 Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/284dfa73-7bc7-4833-ac96-43275795e289: special device /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c does not exist. dmesg(1) may have more information after failed mount system call. 

Nothing relevant in dmesg. Some ignoring cding around suggests that the volume is actually mounted, and the file indicated truly doesn't exist. Restarting the node didn't help.

derekbit commented 1 week ago

@phoenix-frozen Could you provide a support bundle and provide more information for your case?

phoenix-frozen commented 1 week ago

@phoenix-frozen Could you provide a support bundle and provide more information for your case?

Support bundle, I cannot -- I just had to tear down the cluster and am currently rebuilding it.

What kind of information do you need? Happy to describe whatever you'd like; disk setup, cluster components, other services in the cluster, volume information...

innobead commented 1 week ago

Is it possibly related to https://github.com/longhorn/longhorn/issues/8009 or benefit from it? @derekbit @PhanLe1010

phoenix-frozen commented 1 week ago

Happening again on my freshly installed cluster.

8009 does not appear to be relevant. That happened on a node that took an outage, and my problem is happening on a fresh cluster with a newly-instantiated VM.

I am currently generating a support bundle.

phoenix-frozen commented 1 week ago

I... guess the support bundle has been generated?

derekbit commented 1 week ago

I... guess the support bundle has been generated?

Can you upload it here?

phoenix-frozen commented 1 week ago

I... guess the support bundle has been generated?

Can you upload it here?

I'd love to, but... I didn't get an opportunity to download it anywhere. The modal on the webui just disappeared. Where am I supposed to find it?

derekbit commented 1 week ago

I... guess the support bundle has been generated?

Can you upload it here?

I'd love to, but... I didn't get an opportunity to download it anywhere. The modal on the webui just disappeared. Where am I supposed to find it?

Can try this https://longhorn.io/kb/troubleshooting-create-support-bundle-with-curl/

phoenix-frozen commented 1 week ago

Oh, sorry, I have it after all. It's also 700MiB in size, so I can't upload it here.

derekbit commented 1 week ago

Oh, sorry, I have it after all. It's also 700MiB in size, so I can't upload it here.

Can upload it to somewhere and share the link

phoenix-frozen commented 1 week ago

https://drive.google.com/file/d/1J3Ja-q26WCR7jV06vwBSl-DGFUaYDil5/view?usp=drive_link