longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes
https://longhorn.io
Apache License 2.0
6.18k stars 604 forks source link

[BUG] MapVolume.SetUpDevice failed for volume ... no such device or address #9426

Open rajeesh-sdk opened 2 months ago

rajeesh-sdk commented 2 months ago

Describe the bug

While redeploying as Statefulset, the first Pod is unable to attach the volume with the following error message:

0s Warning FailedMapVolume pod/aerospike-0 MapVolume.SetUpDevice failed for volume "pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250" : rpc error: code = Internal desc = failed to create file /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250/pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250: open /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250/pvc-e9e4a78d-82b5-4fdf-9e1b-c3af35b25250: no such device or address

The Volumes are in attached stage and is not showing any errors

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

KubeAdm 1.27

Additional context

Workaround and Mitigation

PhanLe1010 commented 2 months ago

Is it a block volume mode PVC?

Can you scale down the statefuleset, restart kubelets, scale up the statefuleset to see if it change anything?

If still doesn't fix, can you reproduce the issue and send us the logs of the kubelet and longhorn support bundle to longhorn-support-bundle@suse.com?

phoenix-frozen commented 4 weeks ago

I... think I might be running into a similar problem?

Dynamically provisioned volume on block-mode storage using the v2 engine.

 MapVolume.MapPodDevice failed for volume "pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c" : rpc error: code = Internal desc = failed to bind mount "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c" at "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/284dfa73-7bc7-4833-ac96-43275795e289": mount failed: exit status 32 Mounting command: mount Mounting arguments: -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/284dfa73-7bc7-4833-ac96-43275795e289 Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/284dfa73-7bc7-4833-ac96-43275795e289: special device /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c/pvc-e53cfc8c-fdf5-4588-b212-f77856e7581c does not exist. dmesg(1) may have more information after failed mount system call. 

Nothing relevant in dmesg. Some ignoring cding around suggests that the volume is actually mounted, and the file indicated truly doesn't exist. Restarting the node didn't help.

derekbit commented 4 weeks ago

@phoenix-frozen Could you provide a support bundle and provide more information for your case?

phoenix-frozen commented 3 weeks ago

@phoenix-frozen Could you provide a support bundle and provide more information for your case?

Support bundle, I cannot -- I just had to tear down the cluster and am currently rebuilding it.

What kind of information do you need? Happy to describe whatever you'd like; disk setup, cluster components, other services in the cluster, volume information...

innobead commented 3 weeks ago

Is it possibly related to https://github.com/longhorn/longhorn/issues/8009 or benefit from it? @derekbit @PhanLe1010

phoenix-frozen commented 3 weeks ago

Happening again on my freshly installed cluster.

8009 does not appear to be relevant. That happened on a node that took an outage, and my problem is happening on a fresh cluster with a newly-instantiated VM.

I am currently generating a support bundle.

phoenix-frozen commented 3 weeks ago

I... guess the support bundle has been generated?

derekbit commented 3 weeks ago

I... guess the support bundle has been generated?

Can you upload it here?

phoenix-frozen commented 3 weeks ago

I... guess the support bundle has been generated?

Can you upload it here?

I'd love to, but... I didn't get an opportunity to download it anywhere. The modal on the webui just disappeared. Where am I supposed to find it?

derekbit commented 3 weeks ago

I... guess the support bundle has been generated?

Can you upload it here?

I'd love to, but... I didn't get an opportunity to download it anywhere. The modal on the webui just disappeared. Where am I supposed to find it?

Can try this https://longhorn.io/kb/troubleshooting-create-support-bundle-with-curl/

phoenix-frozen commented 3 weeks ago

Oh, sorry, I have it after all. It's also 700MiB in size, so I can't upload it here.

derekbit commented 3 weeks ago

Oh, sorry, I have it after all. It's also 700MiB in size, so I can't upload it here.

Can upload it to somewhere and share the link

phoenix-frozen commented 3 weeks ago

https://drive.google.com/file/d/1J3Ja-q26WCR7jV06vwBSl-DGFUaYDil5/view?usp=drive_link

phoenix-frozen commented 1 week ago

...

derekbit commented 1 week ago

...

@phoenix-frozen This is a private share link.