Closed houzhx759 closed 10 months ago
Please supply more information on the pod configuration, and what kind storage/volumes are being created to identify a root cause.
hello,Let me describe the current problem, which is that if a node node runs two statefulsets, a node node can get up after it goes down. The other one stays in Termination.
Here are the details of the pod in Terminnation state
nfs-provisioner is used for underlying storage
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-minio-0
ReadOnly: false
kube-api-access-v8l6l:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
Name: minio-1
Namespace: devops
Priority: 0
Service Account: default
Node: kube-node-08/10.40.43.141
Start Time: Mon, 14 Aug 2023 16:00:34 +0800
Labels: app=minio
controller-revision-hash=minio-79d988c599
statefulset.kubernetes.io/pod-name=minio-1
Annotations: cni.projectcalico.org/containerID: 01b4769effe191e752d0b5ec829c0f388e63d2ae3fafb04b79cab64855777f4d
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
kubectl.kubernetes.io/restartedAt: 2023-08-14T16:00:09+08:00
Status: Terminating (lasts 13d)
Termination Grace Period: 30s
IP:
IPs:
Name: minio-2
Namespace: devops
Priority: 0
Service Account: default
Node: kube-node-04/10.40.21.93
Start Time: Mon, 14 Aug 2023 16:00:23 +0800
Labels: app=minio
controller-revision-hash=minio-79d988c599
statefulset.kubernetes.io/pod-name=minio-2
Annotations: cni.projectcalico.org/containerID: 5c3d7918c8f708d1ab3d29a2ed33ce1df7365f013bab94071f98e6e574ca3623
cni.projectcalico.org/podIP: 10.42.8.6/32
cni.projectcalico.org/podIPs: 10.42.8.6/32
kubectl.kubernetes.io/restartedAt: 2023-08-14T16:00:09+08:00
Status: Running
IP: 10.42.8.6
IPs:
IP: 10.42.8.6
Controlled By: StatefulSet/minio
Containers:
minio:
Container ID: docker://172a03893edef65044997200bac033715ef2e905482feca6a8a4a435b043c1af
Image: /system/minio:RELEASE.2020-11-06T23-17-07Z
Image ID: docker-pullable:///minio@sha256:a1dc27cbac312868a03c7ffbf35b886f3c24f552d69f1036ea1b80f1153ad9b1
Port: 9000/TCP
Host Port: 0/TCP
Args:
server
http://minio-{0...3}.minio.devops.svc.cluster.local/data
State: Running
Started: Mon, 14 Aug 2023 16:00:28 +0800
Ready: True
Restart Count: 0
Environment:
MINIO_ACCESS_KEY: admin123
MINIO_SECRET_KEY: U*fVXIu8V9RAfP4M
MINIO_PROMETHEUS_AUTH_TYPE: public
Mounts:
/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wzwf (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-minio-2
ReadOnly: false
kube-api-access-8wzwf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
Name: minio-3
Namespace: devops
Priority: 0
Service Account: default
Node: kube-node-06/10.40.165.124
Start Time: Mon, 14 Aug 2023 16:00:13 +0800
Labels: app=minio
controller-revision-hash=minio-79d988c599
statefulset.kubernetes.io/pod-name=minio-3
Annotations: cni.projectcalico.org/containerID: 9ff74b1d85e3abc1f06e56b128ee4873b77d094a775a8ad24c42f4691accec00
cni.projectcalico.org/podIP: 10.42.6.7/32
cni.projectcalico.org/podIPs: 10.42.6.7/32
kubectl.kubernetes.io/restartedAt: 2023-08-14T16:00:09+08:00
Status: Running
IP: 10.42.6.7
IPs:
IP: 10.42.6.7
Controlled By: StatefulSet/minio
Containers:
minio:
Container ID: docker://77b691d838e3e3f59b69584a28d21627871f3c78bbbba8bea08aedc06ff39697
Image: /system/minio:RELEASE.2020-11-06T23-17-07Z
Image ID: docker-pullable:///minio@sha256:a1dc27cbac312868a03c7ffbf35b886f3c24f552d69f1036ea1b80f1153ad9b1
Port: 9000/TCP
Host Port: 0/TCP
Args:
server
http://minio-{0...3}.minio.devops.svc.cluster.local/data
State: Running
Started: Mon, 14 Aug 2023 16:00:18 +0800
Ready: True
Restart Count: 0
Environment:
MINIO_ACCESS_KEY: admin123
MINIO_SECRET_KEY: U*fVXIu8V9RAfP4M
MINIO_PROMETHEUS_AUTH_TYPE: public
Mounts:
/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dmb54 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-minio-3
ReadOnly: false
kube-api-access-dmb54:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
Please supply more information on the pod configuration, and what kind storage/volumes are being created to identify a root cause.请提供有关 Pod 配置的更多信息,以及正在创建哪种存储/卷来确定根本原因。
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.
I know this is an old issue, but I wanted to share a workaround for posterity.
Basically what happened is that /var/lib/kubelet/pods/c831bbc8-2103-4259-af5b-ac98992ad58b/volumes/kubernetes.io~nfs/pvc-82ba9c77-1851-48ec-bc4a-90ecc4b22499
was once an NFS mount, but is no longer an NFS mount. Perhaps Docker crashed, or the node was rebooted. That means that /var/lib/kubelet/pods/c831bbc8-2103-4259-af5b-ac98992ad58b/volumes/kubernetes.io~nfs/pvc-82ba9c77-1851-48ec-bc4a-90ecc4b22499
is an empty, orphaned directory at this point.
k8s thinks that the volume is still an NFS volume, and tries to unmount it. Your storage/CSI provider tries to do umount.nfs /var/lib/kubelet/pods/c831bbc8-2103-4259-af5b-ac98992ad58b/volumes/kubernetes.io~nfs/pvc-82ba9c77-1851-48ec-bc4a-90ecc4b22499
which fails because it's not an NFS volume. So k8s will end up in an endless loop trying to remove it.
To fix:
/var/lib/kubelet/pods/c831bbc8-2103-4259-af5b-ac98992ad58b/volumes/kubernetes.io~nfs/pvc-82ba9c77-1851-48ec-bc4a-90ecc4b22499
to ensure that it is an empty directoryrmdir /var/lib/kubelet/pods/c831bbc8-2103-4259-af5b-ac98992ad58b/volumes/kubernetes.io~nfs/pvc-82ba9c77-1851-48ec-bc4a-90ecc4b22499
.
RKE version: 1.4.4 , k8s version 1.25.6
Docker version: (
docker version
,docker info
preferred)19.03.12-3
Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred):centos7.6,Linux version 5.8.7-1.el7.elrepo.x86_64 (mockbuild@Build64R7)
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Huawei Cloud
A stateful service is always in Terminating state after a node is down. Moreover, the node node will not restart the pod, or will be ina Terminating state, must be forcibly deleted to exit the restart, what is the reason? Has anyone experienced this problem? thank you