Open hajedkh opened 2 months ago
that's Permission denied
error, does manual mount on the node work?
It is the exact same issue using the mount command on the node, randomly sometimes it passes sometimes no.
then it's not the csi driver issue.
@hajedkh do the shares you are connecting to happening to be DFS shares? i just had this exact same issue with the same error, permissions on the shares where fine and hadn't changed. but when i remoted into my worker node and ran journalctl -xe i noticed this error repeated multiple times. "the device mount path ... is still mounted by other references". it appears what happened was when our main file server went down for patching the DFS shares resolved to our backup file server. i could see shares where still mounted to the backup server on the worker host by running "cat /proc/mounts" and looking at the ip address. i think what happened is once the main file server was back online the system tried to mount them against the main fs when pods where brought up but it couldn't because it already had a connection to the backup file server. hence the mounted by other references error. i ended up changing everything to point to the server shares directly not though DFS but it would be nice if DFS worked seamlessly.
You can also consider the CIFS driver shipping with your worker node kernel. We've seen many, many instabilities in the CIFS driver in the shipping Linux kernel before upstream version 6.5 causing similar issues.
@kxs-jnadeau Could you please specify which versions of cifs do you recommend ? I am using CoreOS REHEL 9.2 and cifs module version is 2.37.
We have seen stability with CIFS driver as shipping by AKS on Ubuntu 22.04 but they are seemingly backporting it from kernel 6.5 on the 5.15 Linux baseline at version 2.44.
What happened: Pod Creation Error with event:
Warning FailedMount 28s (x8 over 92s) kubelet MountVolume.MountDevice failed for volume "pvc-f03018a6-a450-41a9-b4f7-0609a57120e7" : rpc error: code = Internal desc = volume(viaps012-int.lia.int/archives#pvc-f03018a6-a450-41a9-b4f7-0609a57120e7##) mount "//<HOST>/archives" on "/var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/0a24123840085c6b252ac47fff4245d291dfda1381a23183f0c8b394e4183af5/globalmount" failed with mount failed: exit status 32 Mounting command: mount Mounting arguments: -t cifs -o dir_mode=0777,file_mode=0777,uid=1001,gid=1001,<masked> //viaps012-int.lia.int/archives /var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/0a24123840085c6b252ac47fff4245d291dfda1381a23183f0c8b394e4183af5/globalmount Output: mount error(13): Permission denied Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)
What you expected to happen: Volume provisionned and pod created
How to reproduce it: Random after multiple volume mounts it fails for some and the pod stay blocked in ContainerCreationError.
Anything else we need to know?: When we reschedule the pod in another node in the cluster it works fine (it happens with all worker nodes). Environment:
kubectl version
): v1.27.13+e709aa5