Kubelet plugin registration hasn't succeeded yet ，file=/var/lib/kubelet/plugins/csi-nfsplugin/registoration doesn't exist

jinwendaiya commented 1 year ago

What happened: After using csi-driver-nfs components to deploy to k8s cluster, it is found that some pods are restarted frequently. after describe the wrong pods, event displays Kubelet plugin registration hasn't succeeded yet ,file =/var/lib/kubelet/plugins/csi-nfsplugin/registoration doesn't exist, which is csi-nfs-node to restart frequently. specifically, the container in it is csi-driver-nfs-registry fault, and pvc using csi-nfsnfs extension cannot be loaded into the application, 599f4b90ed1b1b2e628e9b549002d08

What you expected to happen:

How to reproduce it:

Anything else we need to know?: The following is a screenshot of the fault 8105537d78e1aded3e2f98c1be3a809 e97ca7441a4ef2ad4bf5a356732712c In addition, I found a similar problem in the issue of csi-nfs component of Azure, but I can't help me solve it. The following is the link https://github.com/Azure/secrets-store-csi-driver-provider-azure/issues/829 Environment:

CSI Driver version: v4.1.0
Kubernetes version (use kubectl version):1.21.14
OS (e.g. from /etc/os-release): centos7
Kernel (e.g. uname -a):
Install tools:
Others:

andyzhangx commented 1 year ago

@jinwendaiya what is your kubelet path? by default it's linux.kubelet=/var/lib/kubelet, you may adjust that value if kubelet path is different, details: https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/charts/README.md#tips

jinwendaiya commented 1 year ago

@jinwendaiya what is your kubelet path? by default it's linux.kubelet=/var/lib/kubelet, you may adjust that value if kubelet path is different, details: https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/charts/README.md#tips There should be no problem with the default path. Use the systemctl status kubelet to view the following figure. The kubelet configuration file is in the/var/lib/kubelet directory

jinwendaiya commented 1 year ago

The following is the describe of the failed restart pod [root@ww01 ~]# kubectl describe pod/csi-nfs-node-8fbft -ncsi-nfs Name: csi-nfs-node-8fbft Namespace: csi-nfs Priority: 0 Node: ww06/104.10.15.6 Start Time: Tue, 03 Jan 2023 15:36:23 +0800 Labels: app=csi-nfs-node app.kubernetes.io/instance=csi-driver-nfs app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=csi-driver-nfs app.kubernetes.io/version=v4.1.0 controller-revision-hash=657755d9cd helm.sh/chart=csi-driver-nfs-v4.1.0 pod-template-generation=1 Annotations: Status: Running IP: 104.10.15.6 IPs: IP: 104.10.15.6 Controlled By: DaemonSet/csi-nfs-node Containers: liveness-probe: Container ID: docker://578e2f2e2b8d216da3cf19aa286e1d572c4265e56dff7679ab62de952a11c57b Image: dyrnq/livenessprobe:v2.7.0 Image ID: docker://sha256:4947e46903b36278bb6205e5abf98056373f6491bb6fa7e38ee7f05c6516b12d Port: Host Port: Args: --csi-address=/csi/csi.sock --probe-timeout=3s --health-port=29653 --v=2 State: Running Started: Tue, 03 Jan 2023 15:36:24 +0800 Ready: True Restart Count: 0 Limits: memory: 100Mi Requests: cpu: 10m memory: 20Mi Environment: Mounts: /csi from socket-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pxws9 (ro) node-driver-registrar: Container ID: docker://ea8ba29556e526f1b7dd6726ea6efd50a12a3b03d145beea709df3e970589f3f Image: dyrnq/csi-node-driver-registrar:v2.5.1 Image ID: docker://sha256:720dcdb196378dafb6d8bccc9596b2b1bb9c8f81207fca721f5e7d4f1e1a14ed Port: Host Port: Args: --v=2 --csi-address=/csi/csi.sock --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH) State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 04 Jan 2023 10:54:04 +0800 Finished: Wed, 04 Jan 2023 10:55:03 +0800 Ready: False Restart Count: 338 Limits: memory: 100Mi Requests: cpu: 10m memory: 20Mi Liveness: exec [/csi-node-driver-registrar --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH) --mode=kubelet-registration-probe] delay=30s timeout=15s period=10s #success=1 #failure=3 Environment: DRIVER_REG_SOCK_PATH: /var/lib/kubelet/plugins/csi-nfsplugin/csi.sock KUBE_NODE_NAME: (v1:spec.nodeName) Mounts: /csi from socket-dir (rw) /registration from registration-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pxws9 (ro) nfs: Container ID: docker://d767ce4f6bebb5b87be5c06f7d5b44a99a3d9a9a6ff75dc0f391b8b8541cb3b9 Image: dyrnq/nfsplugin:v4.1.0 Image ID: docker://sha256:7e92ad97b3b3d4196392cff3f71c557f77c95f471975cd1308c817246c965585 Port: 29653/TCP Host Port: 29653/TCP Args: --v=5 --nodeid=$(NODE_ID) --endpoint=$(CSI_ENDPOINT) --drivername=nfs.csi.k8s.io --mount-permissions=511 State: Running Started: Tue, 03 Jan 2023 15:36:24 +0800 Ready: True Restart Count: 0 Limits: memory: 300Mi Requests: cpu: 10m memory: 20Mi Liveness: http-get http://:healthz/healthz delay=30s timeout=10s period=30s #success=1 #failure=5 Environment: NODE_ID: (v1:spec.nodeName) CSI_ENDPOINT: unix:///csi/csi.sock Mounts: /csi from socket-dir (rw) /var/lib/kubelet/pods from pods-mount-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pxws9 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: socket-dir: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/plugins/csi-nfsplugin HostPathType: DirectoryOrCreate pods-mount-dir: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/pods HostPathType: Directory registration-dir: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/plugins_registry HostPathType: Directory kube-api-access-pxws9: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message

Warning Unhealthy 11m (x997 over 19h) kubelet (combined from similar events): Liveness probe failed: F0104 02:46:43.857763 16 main.go:159] Kubelet plugin registration hasn't succeeded yet, file=/var/lib/kubelet/plugins/csi-nfsplugin/registration doesn't exist. goroutine 1 [running]: k8s.io/klog/v2.stacks(0x1) /workspace/vendor/k8s.io/klog/v2/klog.go:1038 +0x8a k8s.io/klog/v2.(loggingT).output(0xf86600, 0x3, 0x0, 0xc000362d20, 0x0, {0xc47a41, 0x1}, 0xc0003eeda0, 0x0) /workspace/vendor/k8s.io/klog/v2/klog.go:987 +0x5fd k8s.io/klog/v2.(loggingT).printf(0xa63799, 0x4, 0x0, {0x0, 0x0}, {0xa8ac8d, 0x48}, {0xc0003eeda0, 0x1, 0x1}) /workspace/vendor/k8s.io/klog/v2/klog.go:753 +0x1c5 k8s.io/klog/v2.Fatalf(...) /workspace/vendor/k8s.io/klog/v2/klog.go:1532 main.main() /workspace/cmd/csi-node-driver-registrar/main.go:159 +0x48e

goroutine 3 [chan receive]: k8s.io/klog/v2.(*loggingT).flushDaemon(0xc0001a71a0) /workspace/vendor/k8s.io/klog/v2/klog.go:1181 +0x6a created by k8s.io/klog/v2.init.0 /workspace/vendor/k8s.io/klog/v2/klog.go:420 +0xfb Warning BackOff 2m2s (x3999 over 19h) kubelet Back-off restarting failed container

jinwendaiya commented 1 year ago

journalctl -u kubelet error：

8f021aeb40a617bd5b6d99bb7942c57

csi-nfs-node the node-driver-registrar container log of the failed pod:

229534b5ce904e96f1dd424c03ee67d

There is also an important environment that I don't know if it has any influence: the SC used in the container uses k8s/nfs-subdir-external-provisioner 106382ab59de790adfe3eab26439fb3

A specific program failure occurred in the complete log above:

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

wenzj-code commented 1 year ago

@jinwendaiya

excuse me, I have the same problem, did you solve it here?

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-csi/csi-driver-nfs/issues/395#issuecomment-1605375217): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-csi / csi-driver-nfs

Kubelet plugin registration hasn't succeeded yet ，file=/var/lib/kubelet/plugins/csi-nfsplugin/registoration doesn't exist #395

journalctl -u kubelet error：

csi-nfs-node the node-driver-registrar container log of the failed pod: