Backend NFS Deployment/Service/PVC/PV are removed before kubelet unmount the volume clearly

jiuchen1986 commented 2 years ago

Describe the bug: A clear and concise description of what the bug is. Sometimes when removing a Pod, which is mounted with a NFS PV, with the corresponding NFS PVC/PV simultaneously, both the Pod/PVC/PV and the backend NFS Deployment/Service/PVC/PV are cleaned so fast that the kubelet on the worker node where the pod used to run can not unmount the NFS volume in time. This makes the remaining NFS volume on the worker node stale and won't be unmounted unless manually doing so. But the IO process will be blocked there forever until rebooting the node.

It's weird though that the Pod object is successfully removed from the cluster even without kubelet completing cleaning mount on the node.

Expected behaviour: A concise description of what you expected to happen The NFS volume mounted on the worker node is cleaned up.

Steps to reproduce the bug: Steps to reproduce the bug should be clear and easily reproducible to help people gain an understanding of the problem

Define a Pod requiring NFS PV

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: openebs-nfs
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 2Gi
  storageClassName: network-file # This is the SC name related to the openebs-nfs-provisioner
---
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: sleep
  name: sleep
spec:
  containers:
  - image: nginx
    name: sleep
    resources: {}
    volumeMounts:
    - name: openebs-nfs
      mountPath: /mnt
  dnsPolicy: ClusterFirst
  terminationGracePeriodSeconds: 0 # intentionally set this to 0
  restartPolicy: Always
  volumes:
  - name: openebs-nfs
    persistentVolumeClaim:
      claimName: openebs-nfs
status: {}

Set the terminationGracePeriodSeconds to 0 so the pod can be quickly removed when deleting it.

Deploy above things and wait all things are up including those backend NFS Deployment/Service/PVC/PV

kubectl -n kube-system get all | grep nfs-pvc
pod/nfs-pvc-9226622c-10b0-4b1d-8d4d-5661c6fec8e3-7cfc9fdc76-x6746   1/1     Running   0              97s
service/nfs-pvc-9226622c-10b0-4b1d-8d4d-5661c6fec8e3   ClusterIP   10.105.148.166   <none>        2049/TCP,111/TCP         96s
deployment.apps/nfs-pvc-9226622c-10b0-4b1d-8d4d-5661c6fec8e3   1/1     1            1           97s
replicaset.apps/nfs-pvc-9226622c-10b0-4b1d-8d4d-5661c6fec8e3-7cfc9fdc76   1         1         1       97s

Use kubectl get po - o wide to get the node where the Pod is running

kubectl get po -o wide
NAME    READY   STATUS    RESTARTS   AGE     IP                NODE                NOMINATED NODE   READINESS GATES
sleep   1/1     Running   0          2m11s   192.168.171.133   node-10-158-36-65   <none>           <none>

Delete above things at the same time via like kubectl delete -f <path_file_of_above_content>

kubectl delete -f pod.yml
persistentvolumeclaim "openebs-nfs" deleted
pod "sleep" deleted

Everything from the kubectl's view will be successfully removed

kubectl get po
No resources found in default namespace.

kubectl -n kube-system get all | grep nfs-pvc

Go the node where the Pod ran and do df -h which will get stuck. Then via mount will see the NFS volume is leftover

# ssh to the node
mount | grep nfs
10.105.148.166:/ on /var/lib/kubelet/pods/947b2765-78f0-4908-8856-5fe09269999e/volumes/kubernetes.io~nfs/pvc-9226622c-10b0-4b1d-8d4d-5661c6fec8e3 type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.158.36.65,local_lock=none,addr=10.105.148.166)

The output of the following commands will help us better understand what's going on:

kubectl get pods -n <openebs_namespace> --show-labels
kubectl get pvc -n <openebs_namespace>
kubectl get pvc -n <application_namespace>

Anything else we need to know?: Add any other context about the problem here.

Environment details:

OpenEBS version (use kubectl get po -n openebs --show-labels):

v0.9.0

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"49499222b0eb0349359881bea01d8d5bd78bf444", GitTreeState:"clean", BuildDate:"2021-12-14T12:41:40Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):

NAME="SLES"
VERSION="15-SP3"
VERSION_ID="15.3"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP3"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp3"
DOCUMENTATION_URL="https://documentation.suse.com/"

kernel (e.g: uname -a):

Linux node-10-158-36-65 5.3.18-57-default #1 SMP Wed Apr 28 10:54:41 UTC 2021 (ba3c2e9) x86_64 x86_64 x86_64 GNU/Linux

others:

The backend storage is Ceph CSI RBD.

StorageClass:

k get sc network-file -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    cas.openebs.io/config: |
      - name: NFSServerType
        value: kernel
      - name: BackendStorageClass
        value: network-block
      - name: LeaseTime
        value: 30
      - name: GraceTime
        value: 30
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"cas.openebs.io/config":"- name: NFSServerType\n  value: kernel\n- name: BackendStorageClass\n  value: network-block\n- name: LeaseTime\n  value: 30\n- name: GraceTime\n  value: 30\n","openebs.io/cas-type":"nfsrwx"},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile"},"name":"network-file"},"provisioner":"openebs.io/nfsrwx","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
    openebs.io/cas-type: nfsrwx
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2022-05-16T21:12:31Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: network-file
  resourceVersion: "3104"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/network-file
  uid: 1a02778d-391f-4e70-a9f1-cd3c7ad230da
provisioner: openebs.io/nfsrwx
reclaimPolicy: Delete
volumeBindingMode: Immediate

pbabilas commented 1 year ago

Got the same issue, with ISCSI backend storage. k8s make unmount only once and when it gets timoeut it just forgots about it. k8s version is 1.21, @jiuchen1986 did you solve this problem?

dsharma-dc commented 11 months ago

This sounds more like a problem or inconvenience in the k8s behaviour. I'm not sure if k8s does a lazy unmount when pod goes away, or if there is a way to specify that. Taking a look...

openebs-archive / dynamic-nfs-provisioner

Backend NFS Deployment/Service/PVC/PV are removed before kubelet unmount the volume clearly #135