kubernetes-csi / csi-driver-smb

This driver allows Kubernetes to access SMB Server on both Linux and Windows nodes.
Apache License 2.0
485 stars 132 forks source link

can't see any content in mnt #818

Closed pbs-jyu closed 1 month ago

pbs-jyu commented 1 month ago

What happened: installed smb csi in two cluster, used the same secret, same code created sc, pvc, and a test pod. in one cluster, I can see all the content under /mnt/share in the pod, but did not see anything in the other cluster. I can reach the SMB server, ping works from the problematic cluster nodes.

What you expected to happen: Contents are definitely there, existing. should see my content after exec -it the pod; when run ls.

How to reproduce it: run the same code over and over again. even tried different share from the same SMB server, as long as tested in problematic cluster A, I just can't see it.

Anything else we need to know?:

Environment:

image: registry.k8s.io/sig-storage/smbplugin:v1.15.0 - CSI Driver version: 1.15.0 - Kubernetes version (use `kubectl version`): v1.29.7 - OS (e.g. from /etc/os-release): - PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy - Kernel (e.g. `uname -a`): 127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux - Install tools: [Microk8s]([my codes.txt](https://github.com/user-attachments/files/16567475/my.codes.txt)) - Others:
andyzhangx commented 1 month ago

what's the df -h or mount | grep cifs return in the pod in problem? is there any cifs mount on the node in problem, you could ssh to the agent node, and run df -h or mount | grep cifs to verify @pbs-jyu

pbs-jyu commented 1 month ago

hi, Andy

Thanks for your response.
k exec -it test-pfs-lab-jyu -n devops -- sh

cd /mnt/pfs-lab

ls

df -h

Filesystem Size Used Avail Use% Mounted on overlay 145G 33G 106G 24% / tmpfs 64M 0 64M 0% /dev /dev/mapper/vgubuntu-root 145G 33G 106G 24% /etc/hosts shm 64M 0 64M 0% /dev/shm tmpfs 7.7G 12K 7.7G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 3.9G 0 3.9G 0% /proc/acpi tmpfs 3.9G 0 3.9G 0% /proc/scsi tmpfs 3.9G 0 3.9G 0% /sys/firmware

mount | grep cifs

# #

exit

from the node where the pod test-pfs-lab-jyu is running on:

$ df -h Filesystem Size Used Avail Use% Mounted on tmpfs 790M 4.0M 786M 1% /run /dev/mapper/vgubuntu-root 145G 33G 106G 24% / tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock efivarfs 256K 47K 205K 19% /sys/firmware/efi/efivars /dev/sda1 511M 6.1M 505M 2% /boot/efi tmpfs 790M 76K 790M 1% /run/user/128 shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/8410b0440b905b5f28b44505e681d2646941fce669eaa34d843f1d7433d497b4/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/344d447de1f08df3e0726ecc9ef6c4c79f30b1045a0ff6da447eadf4acaa932e/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/70e16d6404c9ab7ee9a17e8d5376dcf2b1a111c682f77b352675f65e6ff9d5dc/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/c0d793813a5df5696476053840da4808a0289a0bac6c3ab85af66f375058e582/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/6b8356edaddd39b06bef1b6aee965cc6c97c032225f7ff4b45cdd843022593c9/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/65e4a4d8157a96d03a827ef834cebfef3c7bedd84552d53c67751ac8bdb75c85/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/ce72d88887d78212cd81333cb07f692fdfc18c44aec81f45302948a0abe94fd0/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/eaf4267ba03153bf6635c98d7c60d46c693ca2c160488e1286d4f48cb4720f3b/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/91f3ec56bbaca13253dccc79545e9daef9ec02c585e481329a5b5f3be14362a7/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/17fb06c9446e217e4dd919702a65b13dcff96d74989d076a58f8e70f063d9db2/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/51eeef23fc8858cc1a2993fa4f9ee52e99770d7ccdbc2cf9764ef56d9f70a399/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/3ed893fc62dde1c3ebe212f8b7437e149a2394d9ecba77b39b4f0fcf2960913e/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/3e5ac4cf25cfef87755be882bf0a2a83d7ac6090d86a95b77ea23dc803dfb758/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/f81b1865a8b2ec9a26a6b9620f7d410b37438d934d5c32caefb2d3cabfc2b481/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/841fd5456bf7b8f9e1ba6a9ce1b61e3220fd9108adfab24472cf73fe24328893/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/a0697e88baf6eb7e814f99cf324cf0161042667115a176b5905da51e952422d1/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/b30ae1f40240c524c8a219c9c29bfcb12f7ea2738e24ebf4e0352e29d8036562/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/79a31defa3897374e1e9a3c7c0138c0adb276a420e31088e3ad53c6c43714249/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/bb6e7e948f87064eb26a81d610bea2f1175bcd715f5ea3983fdfff22e34f6b4d/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/4b8ab11e2c6c008d5ef044c9382beb93341fa8375feedb834edf23bf51e86d6f/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/fa8a371ec096fc594a6bab0cca4aebb9021e915105e543160aeaf155c2b03a14/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/c50c4cc13059b16d095135c4446b45d5b1096eaf4d9dce8d3fe174428e9e1e58/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/3a9dec8f33093a0ac69330ae4cfb02af09769ddb15df8742abfd28e702511b0e/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/fe9ca79622d5223534146610d727eb482a2cba6c23ff7e1ed51cc2db0f844ed4/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/28e436367af32707448d6284ffdd77d15b75703093f5b64fe91547c62786d1f5/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/5ed156136b8efbcdaa251f7039d657448c6c082e9c72facb521b62ec2774467c/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/fae4f0756fbeb20a476de482ef6080fa92e676a65dc7c8b93799d488f8eba779/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/79d05019b337228b7373e819af849a8fdde4f0146a1ea0313cda8e0b4c14fb51/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/2c29389c0db5ea1e889b8987da331d7e405d093efcf992ca2973d1b7a8c93dab/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/c988358222df842da9153fbb0d72018f882e3dc861e5aca8c1768361ddeb1389/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/078bd9098866745cef7ca09f1f24d2227385971e8748d97267d54ea673d93b71/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/395df87bc79ca7efc66d506da366d4d68614bf91bbc32fb64be5c09d56bb7b69/shm shm 64M 0 64M 0% /var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/86924c8bf6aa3a5a0bc4951d84bc3517152772dc24d334817d30b2ad9338a2ef/shm tmpfs 790M 64K 790M 1% /run/user/1000 $ mount | grep cifs $

andyzhangx commented 1 month ago

there is no cifs mount on the node, does manual mount work on the node?

mkdir /tmp/test
sudo mount -v -t cifs //smb-server/fileshare /tmp/test -o vers=3.0,username=accountname,password=accountkey,dir_mode=0777,file_mode=0777,cache=strict,actimeo=30
pbs-jyu commented 1 month ago

$ dig smbserver-fqdn # return the IP properly $ nslookup smbserver-fqdn # return the IP properly

running the following manually on the cluster node:

This works: sudo mount -t cifs -o username=myaccount,password='mypassword',domain=mydomain,vers=2.0 //smbserver-ip/share /mnt/share This does not work: ssudo mount -t cifs -o username=myaccount,password='mypassword',domain=mydomain,vers=2.0 //smbserver-fqdn/share /mnt/share

pbs-jyu commented 1 month ago

I have installed cift-utils on each of the cluster nodes, then sudo mount -t cifs -o username=myaccount,password='mypassword',domain=mydomain,vers=2.0 //smbserver-fqdn/share /mnt/share, then I can see /mnt/share contents on the host nodes directly. But I still can't see the contents within the pod after created the secret, sc, pvc/pv, pod.

pbs-jyu commented 1 month ago

in the end, I got this working. But still something does not make sense to me.

Initally, when I can't see the any contents within the pod under /mnt/share, I did the following to find out where the problem lands.

I tried to manually mount the share on the k8s cluster node: first, it only works with: sudo mount -t cifs -o username=myaccount,password='mypassword',domain=mydomain,vers=2.0 //smbserver-ip/share /mnt/share ; then "ls /mnt/share" will return contents. I must install cifs-utils on the nodes, before I got "sudo mount -t cifs -o username=myaccount,password='mypassword',domain=mydomain,vers=2.0 //smbserver-fqdn/share /mnt/share" work.

At least at this stage, I can tell the problem is not related to networking/firewall, nor DNS.

then I tried: created secret; storage classs, pvc which will automatically created a pv, created a pod with the following. I still can't see anything "ls /mnt/share"

apiVersion: v1 kind: Pod metadata: name: test-share-jyu-w-utils namespace: dev spec: containers:

in the end, I must manually run "umount /mnt/share" then "sudo mount -t cifs -o username=myaccount,password='mypassword',domain=mydomain,vers=2.0 //smbserver-fqdn/share /mnt/share", Finally I can see the content when run "ls /mnt/share"

Even it is working now. But I do not understand why I must run "umount /mnt/share" and "sudo mount -t cifs -o username=myaccount,password='mypassword',domain=mydomain,vers=2.0 //smbserver-fqdn/share /mnt/share"" within the pod? Be aware that I have already created the storagclass used proper credetial, with the proper mountOptions, I have created the pvc, and the pod mounted with the proper pvc. After the pod was created, I should be able to see the content under /mnt/share without manually mount it.

pbs-jyu commented 1 month ago

Anyone can help with this bug? The manual umount and manually mount just help us to lock where this problem fall into. But this won't be a fix.

pbs-jyu commented 1 month ago

@andyzhangx I saw this old issue: https://github.com/kubernetes-csi/csi-driver-smb/issues/705 But that was very old one. I have install SMB v0.15.0.

pbs-jyu commented 1 month ago

fyi. I have used:
curl -skSL https://raw.githubusercontent.com/kubernetes-csi/csi-driver-smb/v1.15.0/deploy/install-driver.sh | bash -s v1.15.0 -- --set controller.runOnControlPlane=true --set linux.kubelet="/var/snap/microk8s/common/var/lib/kubelet" note: we use microk8s cluster.

pbs-jyu commented 1 month ago

I FINALLY got this working. For Microk8s cluster, we need to add the last one.

Could anyone who has write access to the /csi-driver-smb/charts/README.md/Tips section? --set controller.runOnControlPlane=true --set linux.kubelet="/var/snap/microk8s/common/var/lib/kubelet" --set "controller.nodeSelector.node.kubernetes.io/microk8s-controlplane"=microk8s-controlplane

pbs-jyu commented 1 month ago

Even it looks much better now. But still something not right.

when I used the following: helm install csi-driver-smb csi-driver-smb/csi-driver-smb --namespace kube-system --version v1.15.0 --set controller.runOnControlPlane=true --set linux.kubelet="/var/snap/microk8s/common/var/lib/kubelet" --set "controller.nodeSelector.node.kubernetes.io/microk8s-controlplane"=microk8s-controlplane Error: INSTALLATION FAILED: 1 error occurred:

Even with this error message, the smb mount inside of the pod behavior properly. I can see the content of the mount as expected. No error messages for each csi-smab-node : "kubectl logs csi-smb-node-xxx -n kube-system"

one thing important here is that I did not see pod csi-smb-controller-xxx running !!

    kubectl get all -n kube-system | grep csi
    pod/csi-smb-node-6kz58                                3/3     Running   0              43s
    pod/csi-smb-node-fdqzv                                3/3     Running   0              43s
    pod/csi-smb-node-xpd49                                3/3     Running   0              43s
    daemonset.apps/csi-smb-node                       3         3         3       3            3           kubernetes.io/os=linux   43s

when I used the following: helm install csi-driver-smb csi-driver-smb/csi-driver-smb --namespace kube-system --version v1.15.0 --set controller.runOnControlPlane=true --set linux.kubelet="/var/snap/microk8s/common/var/lib/kubelet" --set "controller.nodeSelector.node.kubernetes.io/microk8s-controlplane"=microk8s-controlplane # this installation did not through error, but the controller is pending status.

    kubectl get all -n kube-system | grep csi
    csi-smb-controller-667ff888fb-bwsg6               0/3     Pending   0               6m23s
    csi-smb-node-c8xfq                                3/3     Running   0               6m23s
    csi-smb-node-pd7kh                                3/3     Running   0               6m23s
    csi-smb-node-qbcbq                                3/3     Running   0               6m23s

" Warning FailedScheduling 92s default-scheduler 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling."

kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS microk8s-node1 Ready 51d v1.29.7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=s-mick8s1-dt1-x,kubernetes.io/os=linux,microk8s.io/cluster=true,node.kubernetes.io/microk8s-controlplane=microk8s-controlplane microk8s-node2 Ready 51d v1.29.7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=s-mick8s2-dt1-x,kubernetes.io/os=linux,microk8s.io/cluster=true,node.kubernetes.io/microk8s-controlplane=microk8s-controlplane microk8s-node3 Ready 51d v1.29.7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=s-mick8s3-dt1-x,kubernetes.io/os=linux,microk8s.io/cluster=true,node.kubernetes.io/microk8s-controlplane=microk8s-controlplane

andyzhangx commented 1 month ago

@pbs-jyu can you remove --set controller.runOnControlPlane=true in helm install and try again?

pbs-jyu commented 1 month ago

hi, Andy

Thanks for your response. I have done the following tests and the results are as follows:

: This does not work

kubectl label node your-cluster-nodes node-role.kubernetes.io/control-plane- helm install csi-driver-smb csi-driver-smb/csi-driver-smb --namespace kube-system --version v1.15.0

This works:

kubectl label node your-cluster-nodes node-role.kubernetes.io/control-plane- helm install csi-driver-smb csi-driver-smb/csi-driver-smb --namespace kube-system --version v1.15.0 --set linux.kubelet="/var/snap/microk8s/common/var/lib/kubelet"

This works:

kubectl label node your-cluster-nodes node-role.kubernetes.io/control-plane="" helm install csi-driver-smb csi-driver-smb/csi-driver-smb --namespace kube-system --version v1.15.0 --set controller.runOnControlPlane=true --set linux.kubelet="/var/snap/microk8s/common/var/lib/kubelet"

pbs-jyu commented 1 month ago

Thank you for your help, Andy. I will close the case.