kubernetes-csi / csi-driver-nfs

This driver allows Kubernetes to access NFS server on Linux node.
Apache License 2.0
765 stars 236 forks source link

Mounting fails with error "/usr/sbin/start-statd: 10: cannot create /run/rpc.statd.lock: Read-only file system" #678

Closed sivarama-p-raju closed 3 weeks ago

sivarama-p-raju commented 1 month ago

What happened:

On creating a PVC referring to a storageclass for automatic volume provisioning, the PVC goes into a "Pending" state. On describing the PVC, the below errors can be seen:

Events:
  Type     Reason                Age                From                                                                 Message
  ----     ------                ----               ----                                                                 -------
  Normal   ExternalProvisioning  12s (x4 over 44s)  persistentvolume-controller                                          waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator
  Warning  ProvisioningFailed    10s (x3 over 34s)  nfs.csi.k8s.io_<node name>_57695e32-d5e0-4cda-ab84-16bcc6b0d1cf  failed to provision volume with StorageClass "new-test": rpc error: code = Internal desc = failed to mount nfs server: rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs <NFS Server IP>:<path>/new-test /tmp/pvc-cedac322-feaf-40ba-9432-028c2727dbce
Output: /usr/sbin/start-statd: 10: cannot create /run/rpc.statd.lock: Read-only file system
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: mounting <NFS Server IP>:<path>/new-test failed, reason given by server: No such file or directory
  Normal  Provisioning  6s (x4 over 44s)  nfs.csi.k8s.io_<node name>_57695e32-d5e0-4cda-ab84-16bcc6b0d1cf  External provisioner is provisioning volume for claim "default/new-test"

As per the error, the mount process tries to create a lock file in the root filesystem (/run/rpc.statd.lock) and since the root filesystem is read-only.

The below errors can be seen in the nfs container logs as well:

  1. Container name: nfs, part of "csi-nfs-controller" deployment:
Mounting command: mount
Mounting arguments: -t nfs <NFS Server IP>:<path>/new-test /tmp/pvc-cedac322-feaf-40ba-9432-028c2727dbce
Output: /usr/sbin/start-statd: 10: cannot create /run/rpc.statd.lock: Read-only file system
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: mounting <NFS Server IP>:<path>/new-test failed, reason given by server: No such file or directory
  1. Container name: nfs, part of "csi-nfs-node" daemonset:
I0521 17:45:58.526481       1 mount_linux.go:274] Cannot create temp dir to detect safe 'not mounted' behavior: mkdir /tmp/kubelet-detect-safe-umount959869345: read-only file system

What you expected to happen:

The PV should be provisioned without issues and the PVC should be bound. There should not be any mounting issues.

How to reproduce it:

Already described above.

Anything else we need to know?:

To fix this, I had to do the below changes on the templates for the csi-nfs-controller and the csi-nfs-node daemonset:

  1. templates/csi-nfs-controller.yaml

For the "nfs" container, updated "readOnlyRootFilesystem: true" to "readOnlyRootFilesystem: false" under "securityContext".

  1. templates/csi-nfs-node.yaml

For the "nfs" container, updated "readOnlyRootFilesystem: true" to "readOnlyRootFilesystem: false" under "securityContext".

Environment:

Tuco106 commented 1 month ago

I have exactly the same issue since I switched to the Helm-Chart. With the kubectl Install Method the "readOnlyRootFilesystem" is not set according to the following file: https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/v4.7.0/csi-nfs-controller.yaml

sivarama-p-raju commented 1 month ago

@Tuco106 Thank you for your update. It is surprising to see that the helm chart does not work out of the box, due to the problems described above.

Could I request the maintainers to take a look at this and update the chart ?

cccsss01 commented 3 weeks ago

Attempting to migrate from nfs-subdir-external-provisioner (which works great) to this mirrored almost all the same settings and getting this from the nfs container

Cannot create temp dir to detect safe 'not mounted' behavior: mkdir /tmp/kubelet-detect-safe-umount149479443: read-only file system

when running through the troubleshooting guide I get nothing returned when running mount | grep nfs

Which I believe is related to this

andyzhangx commented 3 weeks ago

that's introduced by https://github.com/kubernetes-csi/csi-driver-nfs/pull/422, I will set the default value as false. @farodin91

farodin91 commented 3 weeks ago

Wouldn't be better to make configurable? We run with this setting for one year and have no issue.

farodin91 commented 3 weeks ago

Cannot create temp dir to detect safe 'not mounted' behavior: mkdir /tmp/kubelet-detect-safe-umount149479443: read-only file system

This could be fixed by using an emptydir.

plnordquist commented 1 week ago

I'm not sure if I need to make a new issue about this but I have a few concerns with the fix to this issue. On my cluster, I've enabled the host's rpc.statd service to handle the NFS locks. This fix reverts the read only state of the root filesystem on the nfs controller deployment and it will then start its own rpc.statd service if it is missing. The node daemonset has a similar read only root filesystem setting and I would believe that it would also need the rpc.statd service. Since both the controller and node pods are host network pods, technically the node pod might use the controller pod's rpc.statd service and hide that other nodes that only have the node pod also need to have read-write root filesystems.

Personally, I don't know the best solution. I worry that if the pods need to be restarted due to an upgrade or when a node is being drained, the rpc.statd service would be terminated and outstanding mounts might lose their locking solution and have trouble.