kubernetes-csi / csi-driver-smb

This driver allows Kubernetes to access SMB Server on both Linux and Windows nodes.
Apache License 2.0
496 stars 136 forks source link

Failed Mount Error (exit status 32) When Creating Pod with PVC using the csi-driver-smb #852

Open hajedkh opened 2 months ago

hajedkh commented 2 months ago

What happened: Pod Creation Error with event: Warning FailedMount 28s (x8 over 92s) kubelet MountVolume.MountDevice failed for volume "pvc-f03018a6-a450-41a9-b4f7-0609a57120e7" : rpc error: code = Internal desc = volume(viaps012-int.lia.int/archives#pvc-f03018a6-a450-41a9-b4f7-0609a57120e7##) mount "//<HOST>/archives" on "/var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/0a24123840085c6b252ac47fff4245d291dfda1381a23183f0c8b394e4183af5/globalmount" failed with mount failed: exit status 32 Mounting command: mount Mounting arguments: -t cifs -o dir_mode=0777,file_mode=0777,uid=1001,gid=1001,<masked> //viaps012-int.lia.int/archives /var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/0a24123840085c6b252ac47fff4245d291dfda1381a23183f0c8b394e4183af5/globalmount Output: mount error(13): Permission denied Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)

What you expected to happen: Volume provisionned and pod created

How to reproduce it: Random after multiple volume mounts it fails for some and the pod stay blocked in ContainerCreationError.

Anything else we need to know?: When we reschedule the pod in another node in the cluster it works fine (it happens with all worker nodes). Environment:

andyzhangx commented 1 month ago

that's Permission denied error, does manual mount on the node work?

hajedkh commented 1 month ago

It is the exact same issue using the mount command on the node, randomly sometimes it passes sometimes no.

andyzhangx commented 1 month ago

then it's not the csi driver issue.

pcking999 commented 1 month ago

@hajedkh do the shares you are connecting to happening to be DFS shares? i just had this exact same issue with the same error, permissions on the shares where fine and hadn't changed. but when i remoted into my worker node and ran journalctl -xe i noticed this error repeated multiple times. "the device mount path ... is still mounted by other references". it appears what happened was when our main file server went down for patching the DFS shares resolved to our backup file server. i could see shares where still mounted to the backup server on the worker host by running "cat /proc/mounts" and looking at the ip address. i think what happened is once the main file server was back online the system tried to mount them against the main fs when pods where brought up but it couldn't because it already had a connection to the backup file server. hence the mounted by other references error. i ended up changing everything to point to the server shares directly not though DFS but it would be nice if DFS worked seamlessly.

kxs-jnadeau commented 1 week ago

You can also consider the CIFS driver shipping with your worker node kernel. We've seen many, many instabilities in the CIFS driver in the shipping Linux kernel before upstream version 6.5 causing similar issues.

hajedkh commented 1 week ago

@kxs-jnadeau Could you please specify which versions of cifs do you recommend ? I am using CoreOS REHEL 9.2 and cifs module version is 2.37.

kxs-jnadeau commented 1 week ago

We have seen stability with CIFS driver as shipping by AKS on Ubuntu 22.04 but they are seemingly backporting it from kernel 6.5 on the 5.15 Linux baseline at version 2.44.