kubernetes-csi / csi-driver-nfs

This driver allows Kubernetes to access NFS server on Linux node.
Apache License 2.0
871 stars 252 forks source link

csi-driver-nfs stops working for new PVC on randomly affected nodes. #649

Open dvyas1 opened 7 months ago

dvyas1 commented 7 months ago

What happened: We use csi-driver-nfs to dynamically provision NFS volumes using PVC and attach them to pod. Sometime, the node/driver gets stuck in a weird state and stops working all together on affected node. The driver is not able to connect any new NFS pvc to any pod on this affected node. Creating a new PVC creates directory on NFS server, but mounting that directory on affected node fails. We have to replace the node for the driver to get going again.

What you expected to happen: NFS driver should be able to mount nfs directory on affected node, just like it works on other nodes.

How to reproduce it: This happens randomly and there are no specific steps to replicate it. We observe it once or twice every few months. The only way we can recover is to replace the node.

Anything else we need to know?: Not sure if this matters but we have in-cluster NFS server that is exposed using kubernetes service object. This is EKS cluster using default amazon linux AMIs. we observed issues on kubernetes version 1.25 and 1.27. We have not tested on any other version.

Logs: Running kubectl logs -n kube-system csi-nfs-node-pzs7j -c nfs, we see repeated log entries like the one below. I0416 15:22:39.454827 1 utils.go:109] GRPC call: /csi.v1.Node/NodePublishVolume I0416 15:22:39.454846 1 utils.go:110] GRPC request: {"target_path":"/var/lib/kubelet/pods/096e1ea2-4c1e-46eb-8006-ebcc57e34563/volumes/kubernetes.io~csi/pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["nfsvers=4.1"]}},"access_mode":{"mode":5}},"volume_context":{"csi.storage.k8s.io/pv/name":"pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8","csi.storage.k8s.io/pvc/name":"env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-home","csi.storage.k8s.io/pvc/namespace":"opaque-user","server":"nfs.nfs-server-domain-name","share":"/","storage.kubernetes.io/csiProvisionerIdentity":"1713274642129-9461-nfs.csi.k8s.io","subdir":"pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8"},"volume_id":"nfs.nfs-server-domain-name##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8##"} E0416 15:22:39.454957 1 utils.go:114] GRPC error: rpc error: code = Aborted desc = An operation with the given Volume ID nfs.nfs-server-domain-name##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8## already exists

nfs.nfs-server-domain-name is a custom dns entry mapped to the service ip of in-cluster NFS server using coredns custom config.

Environment:

NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/" SUPPORT_END="2025-06-30"

hoyho commented 7 months ago

A previous volume is mounting (and stuck). This happens for various reasons. Such as networking unreachable, nfs client not responding... I think you can get some debug info for further investigation. You can go inside the driver pod that scheduled to the problematic node and run the mount command manually to see what's going on. for example: mount -t nfs -v nfs.nfs-server-domain-name/pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8 -o nfsvers=4.1 /mnt

dvyas1 commented 7 months ago

I ran the command and got the following output. The command stays stuck after that output.

# mount -t nfs -v nfs.nfs-dns-entry:/pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8 -o nfsvers=4.1 /mnt
mount.nfs: timeout set for Wed Apr 17 12:48:37 2024
mount.nfs: trying text-based options 'nfsvers=4.1,addr=172.20.0.14,clientaddr=10.0.2.227'

172.20.0.14 - Service ip of the NFS server 10.0.2.227 - Node ip address

hoyho commented 7 months ago

It seem DNS is fine. Assuming 172.20.0.14 is accessible. Is there any useful log from kernel? Such as checking the log in host's /var/log/messages

andyzhangx commented 7 months ago

dmesg may also help, since mount -t nfs ... command does not work either, I don't think this issue is related to this csi driver.

dvyas1 commented 7 months ago

Last few messages from dmesg:

[    3.725480] xfs filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
[    3.786293] systemd-journald[1624]: Received request to flush runtime journal from PID 1
[    3.911448] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[    3.920847] Bridge firewalling registered
[    3.950107] pps_core: LinuxPPS API ver. 1 registered
[    3.953863] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    3.972306] PTP clock support registered
[    3.991454] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.12.0g
[    4.006003] ena 0000:00:05.0: ENA device version: 0.10
[    4.009369] ena 0000:00:05.0: ENA controller version: 0.0.1 implementation version 1
[    4.016205] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[    4.027867] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input4
[    4.037933] ACPI: Power Button [PWRF]
[    4.040982] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input5
[    4.041137] mousedev: PS/2 mouse device common for all mice
[    4.049986] ACPI: Sleep Button [SLPF]
[    4.070505] cryptd: max_cpu_qlen set to 1000
[    4.088076] AVX2 version of gcm_enc/dec engaged.
[    4.091253] AES CTR mode by8 optimization enabled
[    4.093898] ena 0000:00:05.0: Forcing large headers and decreasing maximum TX queue size to 512
[    4.100441] ena 0000:00:05.0: ENA Large LLQ is enabled
[    4.112498] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at mem febf4000, mac addr 0e:3e:57:4f:8f:91
[    4.714978] RPC: Registered named UNIX socket transport module.
[    4.718603] RPC: Registered udp transport module.
[    4.721759] RPC: Registered tcp transport module.
[    4.724943] RPC: Registered tcp NFSv4.1 backchannel transport module.

/var/log/messages is full of log entries like this in repeated manner:

Apr 18 12:59:48 ip-10-0-2-227 containerd: time="2024-04-18T12:59:48.486079974Z" level=info msg="StopPodSandbox for \"a02d527d47aa52c2f2bbcfc441dc41ff4ff1589e0e96b7e769a8a07f6e7d2c91\""
Apr 18 12:59:48 ip-10-0-2-227 containerd: time="2024-04-18T12:59:48.486503462Z" level=info msg="Kill container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\""
Apr 18 13:00:30 ip-10-0-2-227 kubelet: E0418 13:00:30.434631    3643 kubelet.go:1884] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[home], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition" pod="my-namespace/env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5"
Apr 18 13:00:30 ip-10-0-2-227 kubelet: E0418 13:00:30.434663    3643 pod_workers.go:1294] "Error syncing pod, skipping" err="unmounted volumes=[home], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition" pod="my-namespace/env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5" podUID=096e1ea2-4c1e-46eb-8006-ebcc57e34563
Apr 18 13:00:42 ip-10-0-2-227 kubelet: I0418 13:00:42.604054    3643 reconciler_common.go:231] "operationExecutor.MountVolume started for volume \"pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8\" (UniqueName: \"kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8##\") pod \"env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5\" (UID: \"096e1ea2-4c1e-46eb-8006-ebcc57e34563\") " pod="my-namespace/env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5"
Apr 18 13:00:42 ip-10-0-2-227 kubelet: E0418 13:00:42.606815    3643 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8## podName: nodeName:}" failed. No retries permitted until 2024-04-18 13:02:44.606800725 +0000 UTC m=+230683.392009574 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8" (UniqueName: "kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8##") pod "env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5" (UID: "096e1ea2-4c1e-46eb-8006-ebcc57e34563") : rpc error: code = Aborted desc = An operation with the given Volume ID nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8## already exists
Apr 18 13:00:51 ip-10-0-2-227 dhclient[2781]: XMT: Solicit on eth0, interval 125350ms.
Apr 18 13:01:01 ip-10-0-2-227 systemd: Created slice User Slice of root.
Apr 18 13:01:01 ip-10-0-2-227 systemd: Started Session 73 of user root.
Apr 18 13:01:01 ip-10-0-2-227 systemd: Removed slice User Slice of root.
Apr 18 13:01:22 ip-10-0-2-227 kubelet: I0418 13:01:22.323364    3643 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"home\" (UniqueName: \"kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-4fca47f1-0977-4635-9602-a4674e852143##\") pod \"b330aada-c335-4561-a123-ec543faf2bdd\" (UID: \"b330aada-c335-4561-a123-ec543faf2bdd\") "
Apr 18 13:01:22 ip-10-0-2-227 kubelet: E0418 13:01:22.324408    3643 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-4fca47f1-0977-4635-9602-a4674e852143## podName:b330aada-c335-4561-a123-ec543faf2bdd nodeName:}" failed. No retries permitted until 2024-04-18 13:03:24.324390297 +0000 UTC m=+230723.109599146 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "home" (UniqueName: "kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-4fca47f1-0977-4635-9602-a4674e852143##") pod "b330aada-c335-4561-a123-ec543faf2bdd" (UID: "b330aada-c335-4561-a123-ec543faf2bdd") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Aborted desc = An operation with the given Volume ID nfs.nfs-dns-entry##pvc-4fca47f1-0977-4635-9602-a4674e852143## already exists
Apr 18 13:01:48 ip-10-0-2-227 containerd: time="2024-04-18T13:01:48.486369925Z" level=error msg="StopPodSandbox for \"a02d527d47aa52c2f2bbcfc441dc41ff4ff1589e0e96b7e769a8a07f6e7d2c91\" failed" error="rpc error: code = DeadlineExceeded desc = failed to stop container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": an error occurs during waiting for container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" to be killed: wait container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": context deadline exceeded"
Apr 18 13:01:48 ip-10-0-2-227 kubelet: E0418 13:01:48.486710    3643 remote_runtime.go:205] "StopPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to stop container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": an error occurs during waiting for container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" to be killed: wait container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": context deadline exceeded" podSandboxID="a02d527d47aa52c2f2bbcfc441dc41ff4ff1589e0e96b7e769a8a07f6e7d2c91"
Apr 18 13:01:48 ip-10-0-2-227 kubelet: E0418 13:01:48.486754    3643 kuberuntime_manager.go:1325] "Failed to stop sandbox" podSandboxID={Type:containerd ID:a02d527d47aa52c2f2bbcfc441dc41ff4ff1589e0e96b7e769a8a07f6e7d2c91}
Apr 18 13:01:48 ip-10-0-2-227 kubelet: E0418 13:01:48.486814    3643 kubelet.go:1973] [failed to "KillContainer" for "env-0f06babb-df89-4ad8-be63-b1e4585cf766" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "914f1fa5-fa96-4509-940c-027e0aba7768" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = failed to stop container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": an error occurs during waiting for container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" to be killed: wait container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": context deadline exceeded"]
Apr 18 13:01:48 ip-10-0-2-227 kubelet: E0418 13:01:48.486843    3643 pod_workers.go:1294] "Error syncing pod, skipping" err="[failed to \"KillContainer\" for \"env-0f06babb-df89-4ad8-be63-b1e4585cf766\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\", failed to \"KillPodSandbox\" for \"914f1fa5-fa96-4509-940c-027e0aba7768\" with KillPodSandboxError: \"rpc error: code = DeadlineExceeded desc = failed to stop container \\\"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\\\": an error occurs during waiting for container \\\"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\\\" to be killed: wait container \\\"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\\\": context deadline exceeded\"]" pod="my-namespace/env-0f06babb-df89-4ad8-be63-b1e4585cf766-575f5988bb-lgtbg" podUID=914f1fa5-fa96-4509-940c-027e0aba7768
Apr 18 13:01:49 ip-10-0-2-227 kubelet: I0418 13:01:49.128598    3643 kuberuntime_container.go:742] "Killing container with a grace period" pod="my-namespace/env-0f06babb-df89-4ad8-be63-b1e4585cf766-575f5988bb-lgtbg" podUID=914f1fa5-fa96-4509-940c-027e0aba7768 containerName="env-0f06babb-df89-4ad8-be63-b1e4585cf766" containerID="containerd://32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295" gracePeriod=30
Apr 18 13:01:49 ip-10-0-2-227 containerd: time="2024-04-18T13:01:49.128841097Z" level=info msg="StopContainer for \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" with timeout 30 (s)"
Apr 18 13:01:49 ip-10-0-2-227 containerd: time="2024-04-18T13:01:49.129231594Z" level=info msg="Skipping the sending of signal terminated to container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" because a prior stop with timeout>0 request already sent the signal"
Apr 18 13:02:19 ip-10-0-2-227 containerd: time="2024-04-18T13:02:19.130119329Z" level=info msg="Kill container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\""
Apr 18 13:02:44 ip-10-0-2-227 kubelet: I0418 13:02:44.686092    3643 reconciler_common.go:231] "operationExecutor.MountVolume started for volume \"pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8\" (UniqueName: \"kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8##\") pod \"env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5\" (UID: \"096e1ea2-4c1e-46eb-8006-ebcc57e34563\") " pod="my-namespace/env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5"

Do you think the NFS server running inside the cluster has anything to do with this? I noticed host network trying to connect to service network here: mount.nfs: trying text-based options 'nfsvers=4.1,addr=172.20.0.14,clientaddr=10.0.2.227'. Does anyone have similar setup and works without any issues?

cccsss01 commented 5 months ago

I've been noticing inconsistency especially when using helm 4.7.0

coldnfire commented 3 months ago

[MountVolume.SetUp Failure on Pod Reschedule with csi-driver-nfs]

Hello guys,

I hope everything is fine for you.

I encounter a similar issue here :

Expected Behavior: The pod should successfully mount the NFS volume upon being rescheduled and not result in persistent mounting errors.

Actual Behavior: The pod fails to mount the NFS volume due to repeated FailedMount errors, preventing the pod from starting.

cloud redis0-5cb7f5457-mclt5 ● 0/1 ContainerCreating 0 0 0 n/a n/a 0 0 n/a neutron 22m

Describe:

Normal Scheduled 10m default-scheduler Successfully assigned cloud/redis0-5cb7f5457-mclt5 to neutron Warning FailedMount 7m45s (x5 over 8m) kubelet MountVolume.SetUp failed for volume "pvc-737f9975-2681-4cc7-b729-12093bfda676" : rpc error: code = Aborted desc = An operation with the given Volume ID nfs0.X.X#vol/brick0/nfs#pvc-737f9975-2681-4cc7-b729-12093bfda676## already exists Warning FailedMount 2m57s (x3 over 8m1s) kubelet MountVolume.SetUp failed for volume "pvc-737f9975-2681-4cc7-b729-12093bfda676" : rpc error: code = DeadlineExceeded desc = context deadline exceeded

Environment:

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Deployment method:

helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm upgrade csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v4.8.0 --install --create-namespace --atomic --cleanup-on-fail

Side nfs server:

/vol/brick0/nfs               192.168.1.7(rw,sync,no_root_squash,no_subtree_check)
/vol/brick0/nfs               192.168.1.8(rw,sync,no_root_squash,no_subtree_check)

Side csi-nfs-node :

kube-system csi-nfs-node-vp7g5 ● 3/3 Running 0 3 31 10 n/a 52 6 192.168.1.8 neutron 25m

Side container nfs registry.k8s.io/sig-storage/nfsplugin:v4.8.0:

E0816 09:09:04.153124       1 utils.go:114] GRPC error:
  rpc error: code = Internal desc = mount failed: exit status 32
  Mounting command: mount
  Mounting arguments: -t nfs -o nfsvers=4.1 nfs0.slyness.ninja:/vol/brick0/nfs/pvc-737f9975-2681-4cc7-b729-12093bfda676 /var/lib/kubelet/pods/e2972965-be5b-4595-a525-dab656dc0512/volumes/kubernetes.io~csi/pvc-737f9975-2681-4cc7-b729-12093bfda676/mount
  Output: mount.nfs: Resource temporarily unavailable

I0816 09:10:41.061830       1 utils.go:109] GRPC call:
  /csi.v1.Node/NodePublishVolume

I0816 09:10:41.061860       1 utils.go:110] GRPC request:
  {"target_path":"/var/lib/kubelet/pods/e2972965-be5b-4595-a525-dab656dc0512/volumes/kubernetes.io~csi/pvc-737f9975-2681-4cc7-b729-12093bfda676/mount",
   "volume_capability":{"AccessType":{"Mount":{"mount_flags":["nfsvers=4.1"]}},
   "access_mode":{"mode":5}},
   "volume_context":{"csi.storage.k8s.io/pv/name":"pvc-737f9975-2681-4cc7-b729-12093bfda676",
                     "csi.storage.k8s.io/pvc/name":"redis0-data",
                     "csi.storage.k8s.io/pvc/namespace":"cloud",
                     "server":"nfs0.slyness.ninja",
                     "share":"/vol/brick0/nfs",
                     "storage.kubernetes.io/csiProvisionerIdentity":"1710693173462-2109-nfs.csi.k8s.io",
                     "subdir":"pvc-737f9975-2681-4cc7-b729-12093bfda676"},
   "volume_id":"nfs0.slyness.ninja#vol/brick0/nfs#pvc-737f9975-2681-4cc7-b729-12093bfda676##"}

I0816 09:10:41.062187       1 nodeserver.go:132] NodePublishVolume:
  volumeID(nfs0.slyness.ninja#vol/brick0/nfs#pvc-737f9975-2681-4cc7-b729-12093bfda676##)
  source(nfs0.slyness.ninja:/vol/brick0/nfs/pvc-737f9975-2681-4cc7-b729-12093bfda676)
  targetPath(/var/lib/kubelet/pods/e2972965-be5b-4595-a525-dab656dc0512/volumes/kubernetes.io~csi/pvc-737f9975-2681-4cc7-b729-12093bfda676/mount)
  mountflags([nfsvers=4.1])

I0816 09:10:41.062225       1 mount_linux.go:218] Mounting cmd (mount) with arguments:
  -t nfs -o nfsvers=4.1 nfs0.slyness.ninja:/vol/brick0/nfs/pvc-737f9975-2681-4cc7-b729-12093bfda676 /var/lib/kubelet/pods/e2972965-be5b-4595-a525-dab656dc0512/volumes/kubernetes.io~csi/pvc-737f9975-2681-4cc7-b729-12093bfda676/mount
neutron ~ > ll /var/lib/kubelet/pods/e2972965-be5b-4595-a525-dab656dc0512/volumes/kubernetes.io~csi/pvc-737f9975-2681-4cc7-b729-12093bfda676/mount
total 48K
  62769 drwxr-xr-x 1 root root  16 Aug 16 10:46 .
 259199 drwxr-x--- 1 root root  36 Aug 16 10:46 ..
4065791 -rw-r--r-- 1 root root 14K Aug 16 10:46 dump.rdb
neutron ~ > df -h
nfs0.x.x:/vol/brick0/nfs/pvc-737f9975-2681-4cc7-b729-12093bfda676  100G   13G   87G  13% /var/lib/kubelet/pods/e2972965-be5b-4595-a525-dab656dc0512/volumes/kubernetes.io~csi/pvc-737f9975-2681-4cc7-b729-12093bfda676/mount

Let me know if further information is needed.

Take care.

andyzhangx commented 3 months ago

you could check whether manual nfs mount works on the node, that's not csi driver issue:

E0816 09:09:04.153124       1 utils.go:114] GRPC error:
  rpc error: code = Internal desc = mount failed: exit status 32
  Mounting command: mount
  Mounting arguments: -t nfs -o nfsvers=4.1 nfs0.slyness.ninja:/vol/brick0/nfs/pvc-737f9975-2681-4cc7-b729-12093bfda676 /var/lib/kubelet/pods/e2972965-be5b-4595-a525-dab656dc0512/volumes/kubernetes.io~csi/pvc-737f9975-2681-4cc7-b729-12093bfda676/mount
  Output: mount.nfs: Resource temporarily unavailable
mount -t nfs -o nfsvers=4.1 nfs0.slyness.ninja ...
coldnfire commented 3 months ago

Shame on me, you are right...

I initialized my other master without these two arguments:

--cluster-cidr 10.0.0.0/16 \
--service-cidr 10.1.0.0/16 \

I reinitialized the second master with the correct arguments, then restarted the csi-nfs-node pod, and everything is working fine now.

Sorry for the noise.

Take care,

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale