joe-elliott / cert-exporter

A Prometheus exporter that publishes cert expirations on disk and in Kubernetes secrets
Apache License 2.0
320 stars 87 forks source link

Ceph rbd is not unmapped automatically. Need to add an option to change volume mount propagation (mountPropagation). #88

Open FessAectan opened 3 years ago

FessAectan commented 3 years ago

Hi there.

I've started to use cert-exporter and faced a problem: when I delete a pod, that use ceph rbd, that rbd is not unmapped/unmount automatically from k8s node and the pod cannot be scheduled on another node. Kubelet logs:

Aug 11 09:17:42 kube-01 d8-kubelet-forker[14693]: E0811 09:17:42.907060   14694 nestedpendingoperations.go:301] 
Operation for "{volumeName:kubernetes.io/rbd/kube:kubernetes-dynamic-pvc-018409f2-e715-4452-b483-c011772acec9 
podName: nodeName:}" failed. No retries permitted until 2021-08-11 09:17:43.407003409 +0300 MSK 
m=+1698521.295086326 (durationBeforeRetry 500ms). Error: "UnmountDevice failed for volume \"pvc-323d06cc-947b-
4c64-aaa0-e2f8963d27e5\" (UniqueName: \"kubernetes.io/rbd/kube:kubernetes-dynamic-pvc-018409f2-e715-4452-b483-
c011772acec9\") on node \"kube-01\" : rbd: failed to unmap device /dev/rbd10, error exit status 16, rbd output: [114 98 100 58 
32 115 121 115 102 115 32 119 114 105 116 101 32 102 97 105 108 101 100 10 114 98 100 58 32 117 110 109 97 112 32 102 97 105 
108 101 100 58 32 40 49 54 41 32 68 101 118 105 99 101 32 111 114 32 114 101 115 111 117 114 99 101 32 98 117 115 121 10]"
...
Aug 11 09:17:43 kube-01 d8-kubelet-forker[14693]: E0811 09:17:43.429490   14694 nestedpendingoperations.go:301] 
Operation for "{volumeName:kubernetes.io/rbd/kube:kubernetes-dynamic-pvc-018409f2-e715-4452-b483-c011772acec9 
podName: nodeName:}" failed. No retries permitted until 2021-08-11 09:17:44.429433563 +0300 MSK 
m=+1698522.317516538 (durationBeforeRetry 1s). Error: "UnmountDevice failed for volume \"pvc-323d06cc-947b-4c64-
aaa0-e2f8963d27e5\" (UniqueName: \"kubernetes.io/rbd/kube:kubernetes-dynamic-pvc-018409f2-e715-4452-b483-
c011772acec9\") on node \"kube-01\" : Unmount failed: exit status 32\nUnmounting arguments: 
/var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-018409f2-e715-4452-b483-
c011772acec9\nOutput: umount: /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-
018409f2-e715-4452-b483-c011772acec9: not mounted\n\n"

After some research I found the root of this issue (thanks to this guy - https://cloud.tencent.com/developer/article/1469532). This is a cert-exporter pod running on the same k8s node. cert-exporter pod mounts /var/lib/kubelet and pods, that use ceph rbd, mounts over it to /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/blablabla

Related issue https://github.com/kubernetes/kubernetes/issues/54214. Related PR in prometheus-node-exporter https://github.com/helm/charts/pull/11194/files

The solution is to add an option to configure mountPropagation in daemonsets like that:

volumeMounts:
        - mountPath: /var/lib/kubelet
          mountPropagation: HostToContainer
          name: kubelet
          readOnly: true