kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.89k stars 39.61k forks source link

Completed pod stuck in Terminating state after kubelet restart #116954

Closed PaulFurtado closed 8 months ago

PaulFurtado commented 1 year ago

What happened?

A job pod was terminating kubelet restarted late in that process, after which, the pod shows up with a Terminated status in kubectl get pods, but is in phase: Succeeded with all of its resources cleaned up.

kubectl shows:

NAME                           READY   STATUS        RESTARTS   AGE
cidr-0-backup-27994557-hhz76   0/1     Terminating   0          3d5h

The pod's metadata shows:

apiVersion: v1
kind: Pod
metadata:
  annotations: <removed>
  creationTimestamp: "2023-03-24T15:57:00Z"
  deletionGracePeriodSeconds: 86400
  deletionTimestamp: "2023-03-28T20:18:17Z"
  generateName: cidr-0-backup-27994557-
  labels: <removed>
  name: cidr-0-backup-27994557-hhz76
  namespace: vt-green
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: cidr-0-backup-27994557
    uid: ba1dadaf-f0f5-41e2-b5bc-f88bc7205385
  resourceVersion: "7703154732"
  uid: 4657c2b5-7dd2-42fb-a688-c81d37e82815

and the status shows:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-03-24T15:58:09Z"
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-03-27T20:18:18Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-03-27T20:18:18Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-03-24T15:57:00Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://6a793a370006700a0fab52c84b46ee602536fb0b2f5a338a1cb7fe5750b605ba
    image: <redacted>
    imageID: 998a9d0822290770746da44f57c1c97e6a6d26105e370cf065ae9b3b4e2410e8
    lastState: {}
    name: backup
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: cri-o://6a793a370006700a0fab52c84b46ee602536fb0b2f5a338a1cb7fe5750b605ba
        exitCode: 0
        finishedAt: "2023-03-24T16:02:08Z"
        reason: Completed
        startedAt: "2023-03-24T15:58:10Z"
  hostIP: 172.20.207.253
  initContainerStatuses:
  - containerID: cri-o://04244b25067cf0134c0aa60110fbb53153c37f47cd0c688d877b544cb352a334
    image: <redacted>
    imageID: 998a9d0822290770746da44f57c1c97e6a6d26105e370cf065ae9b3b4e2410e8
    lastState: {}
    name: init-vtroot
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://04244b25067cf0134c0aa60110fbb53153c37f47cd0c688d877b544cb352a334
        exitCode: 0
        finishedAt: "2023-03-24T15:58:09Z"
        reason: Completed
        startedAt: "2023-03-24T15:58:06Z"
  phase: Succeeded
  podIP: 172.20.201.120
  podIPs:
  - ip: 172.20.201.120
  qosClass: Burstable
  startTime: "2023-03-24T15:57:00Z"

It appears that the only difference between this pod and one that shows up as Completed in kubectl is that this one hasn't had its deletionTimestamp nulled out?

Logs for kubelet shutting down the pod:

I0327 20:18:18.822129   23955 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"zookeeper-secrets\" (UniqueName: \"kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-zookeeper-secrets\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
I0327 20:18:18.822233   23955 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"kube-api-access-c8z8d\" (UniqueName: \"kubernetes.io/projected/4657c2b5-7dd2-42fb-a688-c81d37e82815-kube-api-access-c8z8d\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
I0327 20:18:18.822336   23955 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"cidr-0-vtdataroot-0\" (UniqueName: \"hubspot/temp-disk/4657c2b5-7dd2-42fb-a688-c81d37e82815-cidr-0-vtdataroot-0\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
I0327 20:18:18.822386   23955 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"live-config\" (UniqueName: \"kubernetes.io/configmap/4657c2b5-7dd2-42fb-a688-c81d37e82815-live-config\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
I0327 20:18:18.822430   23955 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"vitess-secrets-vault\" (UniqueName: \"kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-vault\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
I0327 20:18:18.822482   23955 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"downward-api\" (UniqueName: \"kubernetes.io/downward-api/4657c2b5-7dd2-42fb-a688-c81d37e82815-downward-api\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
I0327 20:18:18.822524   23955 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"vitess-secrets-certs\" (UniqueName: \"kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-certs\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
I0327 20:18:18.822550   23955 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"vitess-secrets-aws\" (UniqueName: \"kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-aws\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
W0327 20:18:18.822809   23955 empty_dir.go:517] Warning: Failed to clear quota on /mnt/kubelet/pods/4657c2b5-7dd2-42fb-a688-c81d37e82815/volumes/kubernetes.io~configmap/live-config: clearQuota called, but quotas disabled
I0327 20:18:18.823088   23955 operation_generator.go:917] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/4657c2b5-7dd2-42fb-a688-c81d37e82815-live-config" (OuterVolumeSpecName: "live-config") pod "4657c2b5-7dd2-42fb-a688-c81d37e82815" (UID: "4657c2b5-7dd2-42fb-a688-c81d37e82815"). InnerVolumeSpecName "live-config". PluginName "kubernetes.io/configmap", VolumeGidValue ""
I0327 20:18:18.924174   23955 reconciler.go:384] "Volume detached for volume \"live-config\" (UniqueName: \"kubernetes.io/configmap/4657c2b5-7dd2-42fb-a688-c81d37e82815-live-config\") on node \"ip-172-20-207-253.eu-central-1.compute.internal\" DevicePath \"\""
I0327 20:18:18.939224   23955 scope.go:110] "RemoveContainer" containerID="6a793a370006700a0fab52c84b46ee602536fb0b2f5a338a1cb7fe5750b605ba"
I0327 20:18:19.060577   23955 operation_generator.go:917] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/4657c2b5-7dd2-42fb-a688-c81d37e82815-kube-api-access-c8z8d" (OuterVolumeSpecName: "kube-api-access-c8z8d") pod "4657c2b5-7dd2-42fb-a688-c81d37e82815" (UID: "4657c2b5-7dd2-42fb-a688-c81d37e82815"). InnerVolumeSpecName "kube-api-access-c8z8d". PluginName "kubernetes.io/projected", VolumeGidValue ""
I0327 20:18:19.060703   23955 operation_generator.go:917] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-zookeeper-secrets" (OuterVolumeSpecName: "zookeeper-secrets") pod "4657c2b5-7dd2-42fb-a688-c81d37e82815" (UID: "4657c2b5-7dd2-42fb-a688-c81d37e82815"). InnerVolumeSpecName "zookeeper-secrets". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0327 20:18:19.061282   23955 operation_generator.go:917] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-aws" (OuterVolumeSpecName: "vitess-secrets-aws") pod "4657c2b5-7dd2-42fb-a688-c81d37e82815" (UID: "4657c2b5-7dd2-42fb-a688-c81d37e82815"). InnerVolumeSpecName "vitess-secrets-aws". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0327 20:18:19.061294   23955 operation_generator.go:917] UnmountVolume.TearDown succeeded for volume "kubernetes.io/downward-api/4657c2b5-7dd2-42fb-a688-c81d37e82815-downward-api" (OuterVolumeSpecName: "downward-api") pod "4657c2b5-7dd2-42fb-a688-c81d37e82815" (UID: "4657c2b5-7dd2-42fb-a688-c81d37e82815"). InnerVolumeSpecName "downward-api". PluginName "kubernetes.io/downward-api", VolumeGidValue ""
I0327 20:18:19.067065   23955 operation_generator.go:917] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-certs" (OuterVolumeSpecName: "vitess-secrets-certs") pod "4657c2b5-7dd2-42fb-a688-c81d37e82815" (UID: "4657c2b5-7dd2-42fb-a688-c81d37e82815"). InnerVolumeSpecName "vitess-secrets-certs". PluginName "kubernetes.io/secret", VolumeGidValue ""
W0327 20:18:19.073609   23955 mount_helper_common.go:133] Warning: "/mnt/kubelet/pods/4657c2b5-7dd2-42fb-a688-c81d37e82815/volume-subpaths/cidr-0-vtdataroot-0/backup/7" is not a mountpoint, deleting
I0327 20:18:19.122084   23955 operation_generator.go:917] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-vault" (OuterVolumeSpecName: "vitess-secrets-vault") pod "4657c2b5-7dd2-42fb-a688-c81d37e82815" (UID: "4657c2b5-7dd2-42fb-a688-c81d37e82815"). InnerVolumeSpecName "vitess-secrets-vault". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0327 20:18:19.126096   23955 reconciler.go:384] "Volume detached for volume \"vitess-secrets-vault\" (UniqueName: \"kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-vault\") on node \"ip-172-20-207-253.eu-central-1.compute.internal\" DevicePath \"\""
I0327 20:18:19.126126   23955 reconciler.go:384] "Volume detached for volume \"downward-api\" (UniqueName: \"kubernetes.io/downward-api/4657c2b5-7dd2-42fb-a688-c81d37e82815-downward-api\") on node \"ip-172-20-207-253.eu-central-1.compute.internal\" DevicePath \"\""
I0327 20:18:19.126137   23955 reconciler.go:384] "Volume detached for volume \"vitess-secrets-certs\" (UniqueName: \"kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-certs\") on node \"ip-172-20-207-253.eu-central-1.compute.internal\" DevicePath \"\""
I0327 20:18:19.126145   23955 reconciler.go:384] "Volume detached for volume \"vitess-secrets-aws\" (UniqueName: \"kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-vitess-secrets-aws\") on node \"ip-172-20-207-253.eu-central-1.compute.internal\" DevicePath \"\""
I0327 20:18:19.126154   23955 reconciler.go:384] "Volume detached for volume \"zookeeper-secrets\" (UniqueName: \"kubernetes.io/secret/4657c2b5-7dd2-42fb-a688-c81d37e82815-zookeeper-secrets\") on node \"ip-172-20-207-253.eu-central-1.compute.internal\" DevicePath \"\""
I0327 20:18:19.126170   23955 reconciler.go:384] "Volume detached for volume \"kube-api-access-c8z8d\" (UniqueName: \"kubernetes.io/projected/4657c2b5-7dd2-42fb-a688-c81d37e82815-kube-api-access-c8z8d\") on node \"ip-172-20-207-253.eu-central-1.compute.internal\" DevicePath \"\""
W0327 20:18:19.241455   23955 mount_helper_common.go:133] Warning: "/mnt/kubelet/pods/4657c2b5-7dd2-42fb-a688-c81d37e82815/volume-subpaths/cidr-0-vtdataroot-0/backup/8" is not a mountpoint, deleting
W0327 20:18:19.408592   23955 mount_helper_common.go:133] Warning: "/mnt/kubelet/pods/4657c2b5-7dd2-42fb-a688-c81d37e82815/volume-subpaths/cidr-0-vtdataroot-0/init-vtroot/1" is not a mountpoint, deleting
W0327 20:18:19.572157   23955 mount_helper_common.go:133] Warning: "/mnt/kubelet/pods/4657c2b5-7dd2-42fb-a688-c81d37e82815/volume-subpaths/cidr-0-vtdataroot-0/init-vtroot/2" is not a mountpoint, deleting

Kubelet begins restarting at 20:18:27.944583 and comes back up at 20:18:28.128820. It then finishes the last volume after the restart:

I0327 20:19:31.311373 1006789 reconciler.go:209] "operationExecutor.UnmountVolume started for volume \"cidr-0-vtdataroot-0\" (UniqueName: \"hubspot/temp-disk/4657c2b5-7dd2-42fb-a688-c81d37e82815-cidr-0-vtdataroot-0\") pod \"4657c2b5-7dd2-42fb-a688-c81d37e82815\" (UID: \"4657c2b5-7dd2-42fb-a688-c81d37e82815\") "
I0327 20:19:32.404493 1006789 operation_generator.go:917] UnmountVolume.TearDown succeeded for volume "hubspot/temp-disk/4657c2b5-7dd2-42fb-a688-c81d37e82815-cidr-0-vtdataroot-0" (OuterVolumeSpecName: "cidr-0-vtdataroot-0") pod "4657c2b5-7dd2-42fb-a688-c81d37e82815" (UID: "4657c2b5-7dd2-42fb-a688-c81d37e82815"). InnerVolumeSpecName "cidr-0-vtdataroot-0". PluginName "hubspot/temp-disk", VolumeGidValue ""
I0327 20:19:32.428891 1006789 reconciler.go:384] "Volume detached for volume \"cidr-0-vtdataroot-0\" (UniqueName: \"hubspot/temp-disk/4657c2b5-7dd2-42fb-a688-c81d37e82815-cidr-0-vtdataroot-0\") on node \"ip-172-20-207-253.eu-central-1.compute.internal\" DevicePath \"\""

after this point, it never logs anything about the pod or its containers ever again.

What did you expect to happen?

Pod should not have gotten stuck in this state.

How can we reproduce it (as minimally and precisely as possible)?

It is difficult to reproduce since you likely need to restart kubelet at the exact right time.

Anything else we need to know?

Feel free to mark this as a duplicate of another issue, but I sifted through many of the related stuck terminating issues and couldn't find one that seemed like an exact match.

We see this issue crop up 0-4 times per week in large clusters (300-700 nodes, 4,000-16,000 pods). We had refrained from filing an upstream issue before because we blamed docker/cri-dockerd for most termination issues, but this issue persists on kubernetes 1.23.16 with cri-o.

Kubernetes version

```console $ kubectl version Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.16", GitCommit:"60e5135f758b6e43d0523b3277e8d34b4ab3801f", GitTreeState:"archive", BuildDate:"2023-03-10T22:26:02Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.16", GitCommit:"60e5135f758b6e43d0523b3277e8d34b4ab3801f", GitTreeState:"archive", BuildDate:"2023-03-10T22:26:02Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/arm64"} ```

Cloud provider

AWS

OS version

```console # On Linux: $ cat /etc/os-release NAME="CentOS Stream" VERSION="8" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="8" PLATFORM_ID="platform:el8" PRETTY_NAME="CentOS Stream 8" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:8" HOME_URL="https://centos.org/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream" ```

Install tools

custom/puppet

Container runtime (CRI) and version (if applicable)

# crictl version Version: 0.1.0 RuntimeName: cri-o RuntimeVersion: 1.23.4 RuntimeApiVersion: v1alpha2

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
PaulFurtado commented 1 year ago

/sig node

kundan2707 commented 1 year ago

/kind support

SergeyKanzhelev commented 1 year ago

/remove-kind support

let's find out whether it is a bug or not before applying support label

/triage needs-information

Any chance you can try supported k8s version? 1.23 is out of support in OSS.

With the load you have it may not be feasible. But if you can localize the issue and collect higher level kubelet logs (4) - it will help.

/cc @harche

PaulFurtado commented 1 year ago

Any chance you can try supported k8s version? 1.23 is out of support in OSS.

We're working on it, but unfortunately 1.24 and 1.25 remove so many APIs that it will be months before we have resolved all those deprecations and can actually upgrade our big clusters (multi-tenant, lots of different teams need to fix code).

With the load you have it may not be feasible. But if you can localize the issue and collect higher level kubelet logs (4) - it will help.

Happy to turn these on, we don't need these debug logs shipped off of the machine so there isn't a big cost impact.


I'd also love to help debugging/fixing this, but the pod lifecycle code in kubelet is pretty complicated and I could use a tip:

Given the pod status that I posted above, is it correct to say that the next step that should have occurred is that kubelet makes some API call to the apiserver that removes the deletionTimestamp field? What function in kubelet it responsible for that?

I was having a hard time tracking that piece down when I was reading the code. Since restarting kubelet does not remedy this state, the next time we have a pod stuck like this we can try to fix it or at least add additional log statements, since we can endlessly recompile kublet and retry.

Thanks!

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 8 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/kubernetes/issues/116954#issuecomment-1951703986): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.