kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.43k stars 39.48k forks source link

Stop the runtime service, unfinished jobs become succeed #111744

Closed lilongfeng0902 closed 2 years ago

lilongfeng0902 commented 2 years ago

What happened?

First,create a job,just as below:

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2022-08-01T11:35:51Z"
  labels:
    app: xxxx-job-cqog
  name: xxxx-job-cqog
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: c6ce8df6-51ba-464d-bb75-f37129e9b9d0
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: xxxx-job-cqog
        controller-uid: c6ce8df6-51ba-464d-bb75-f37129e9b9d0
        job-name: xxxx-job-cqog
    spec:
      containers:
      - image: xx.xx.xx.xx/library/tomcat:8.5.40
        imagePullPolicy: IfNotPresent
        name: container-0
        resources:
          limits:
            cpu: 200m
            memory: 256Mi
          requests:
            cpu: 200m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

Then,when the job is running, execute command "systemctl stop docker", the pod becomes completed. The pod yaml file ,just as below

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/podIP: ""
    cni.projectcalico.org/podIPs: ""
  creationTimestamp: "2022-08-01T11:35:51Z"
  generateName: xxxx-job-cqog-
  labels:
    app: xxxx-job-cqog
    controller-uid: c6ce8df6-51ba-464d-bb75-f37129e9b9d0
    job-name: xxxx-job-cqog
  name: xxxx-job-cqog-z7ml2
spec:
  containers:
  - image: xx.xx.xx.xx/library/tomcat:8.5.40
    imagePullPolicy: IfNotPresent
    name: container-0
    resources:
      limits:
        cpu: 200m
        memory: 256Mi
      requests:
        cpu: 200m
        memory: 256Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-sm4mm
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: node4
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-sm4mm
    secret:
      defaultMode: 420
      secretName: default-token-sm4mm
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-08-01T11:35:51Z"
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-08-05T05:50:37Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-08-05T05:50:37Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-08-01T11:35:51Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6
    image: xx.xx.xx.xx/library/tomcat:8.5.39
    imageID: docker-pullable://xx.xx.xx.xx/library/tomcat@sha256:96f9540f50bf96b48fdeb5aa490b71505b7e8ab11e12e1af126f551270c49998
    lastState: {}
    name: container-0
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6
        exitCode: 0
        finishedAt: "2022-08-05T05:50:37Z"
        reason: Completed
        startedAt: "2022-08-01T11:35:52Z"
  hostIP: 100.2.220.98
  phase: Succeeded
  qosClass: Guaranteed
  startTime: "2022-08-01T11:35:51Z"

Then you can exec command "systemctl restart docker",but the job change the completed status, the yaml of status like below:

status:
  completionTime: "2022-08-05T05:50:38Z"
  conditions:
  - lastProbeTime: "2022-08-05T05:50:38Z"
    lastTransitionTime: "2022-08-05T05:50:38Z"
    status: "True"
    type: Complete
  startTime: "2022-08-01T11:35:51Z"
  succeeded: 1

What did you expect to happen?

Infact, the job has not completed, but the job has stopped. I hope that when the runtime service is normal,the job will run normal again.

How can we reproduce it (as minimally and precisely as possible)?

Firstly, create a job, until the job become running. Secondly, you stop the runtime service, by exec "systemclt stop docker.service". Thirdly, wait until the pod change completed status, then start docker service. Finnaly,you will reproduce it.

Anything else we need to know?

log about the question:

I0805 13:50:33.586903 2023330 manager.go:1044] Destroyed container: "/kubepods.slice/kubepods-podc3c2a1b5_5e3d_4e7c_92d7_39767813457e.slice/docker-6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb.scope" (aliases: [k8s_POD_xxxx-job-cqog-z7ml2_lilf-bug_c3c2a1b5-5e3d-4e7c-92d7-39767813457e_0 6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb], namespace: "docker")
I0805 13:50:35.666781 2023330 generic.go:155] GenericPLEG: c3c2a1b5-5e3d-4e7c-92d7-39767813457e/6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb: running -> exited
E0805 13:50:35.751455 2023330 remote_runtime.go:206] ListPodSandbox with filter &PodSandboxFilter{Id:,State:nil,LabelSelector:map[string]string{io.kubernetes.pod.uid: c3c2a1b5-5e3d-4e7c-92d7-39767813457e,},} from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0805 13:50:35.751891 2023330 kuberuntime_sandbox.go:279] ListPodSandbox with pod UID "c3c2a1b5-5e3d-4e7c-92d7-39767813457e" failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
I0805 13:50:36.797947 2023330 generic.go:155] GenericPLEG: c3c2a1b5-5e3d-4e7c-92d7-39767813457e/6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb: running -> exited
I0805 13:50:36.811441 2023330 kuberuntime_manager.go:958] getSandboxIDByPodUID got sandbox IDs ["6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb"] for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:36.816418 2023330 generic.go:386] PLEG: Write status for xxxx-job-cqog-z7ml2/lilf-bug: &container.PodStatus{ID:"c3c2a1b5-5e3d-4e7c-92d7-39767813457e", Name:"xxxx-job-cqog-z7ml2", Namespace:"lilf-bug", IPs:[]string{}, ContainerStatuses:[]*container.Status{(*container.Status)(0xc0020e01c0)}, SandboxStatuses:[]*v1alpha2.PodSandboxStatus{(*v1alpha2.PodSandboxStatus)(0xc000f96900)}} (err: <nil>)
I0805 13:50:36.816505 2023330 kubelet.go:1952] SyncLoop (PLEG): "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)", event: &pleg.PodLifecycleEvent{ID:"c3c2a1b5-5e3d-4e7c-92d7-39767813457e", Type:"ContainerDied", Data:"6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb"}
I0805 13:50:36.816584 2023330 kubelet_pods.go:1482] Generating status for "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:36.816647 2023330 kubelet_pods.go:1482] Generating status for "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:36.816944 2023330 volume_manager.go:373] Waiting for volumes to attach and mount for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:36.816991 2023330 volume_manager.go:404] All volumes are attached and mounted for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:36.817004 2023330 kuberuntime_manager.go:457] No ready sandbox for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" can be found. Need to start a new one
I0805 13:50:36.817016 2023330 kuberuntime_manager.go:678] computePodActions got {KillPod:true CreateSandbox:false SandboxID:6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb Attempt:1 NextInitContainerToStart:nil ContainersToStart:[] ContainersToKill:map[] EphemeralContainersToStart:[]} for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:36.817048 2023330 kuberuntime_manager.go:696] Stopping PodSandbox for "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" because all other containers are dead.
I0805 13:50:36.822331 2023330 desired_state_of_world_populator.go:361] Added volume "default-token-sm4mm" (volSpec="default-token-sm4mm") for pod "c3c2a1b5-5e3d-4e7c-92d7-39767813457e" to desired state.
I0805 13:50:36.833845 2023330 reconciler.go:254] Starting operationExecutor.MountVolume for volume "default-token-sm4mm" (UniqueName: "kubernetes.io/secret/c3c2a1b5-5e3d-4e7c-92d7-39767813457e-default-token-sm4mm") pod "xxxx-job-cqog-z7ml2" (UID: "c3c2a1b5-5e3d-4e7c-92d7-39767813457e") Volume is already mounted to pod, but remount was requested.
I0805 13:50:36.834211 2023330 secret.go:183] Setting up volume default-token-sm4mm for pod c3c2a1b5-5e3d-4e7c-92d7-39767813457e at /var/lib/kubelet/pods/c3c2a1b5-5e3d-4e7c-92d7-39767813457e/volumes/kubernetes.io~secret/default-token-sm4mm
I0805 13:50:36.834350 2023330 atomic_writer.go:158] pod lilf-bug/xxxx-job-cqog-z7ml2 volume default-token-sm4mm: no update required for target directory /var/lib/kubelet/pods/c3c2a1b5-5e3d-4e7c-92d7-39767813457e/volumes/kubernetes.io~secret/default-token-sm4mm
I0805 13:50:36.834364 2023330 operation_generator.go:672] MountVolume.SetUp succeeded for volume "default-token-sm4mm" (UniqueName: "kubernetes.io/secret/c3c2a1b5-5e3d-4e7c-92d7-39767813457e-default-token-sm4mm") pod "xxxx-job-cqog-z7ml2" (UID: "c3c2a1b5-5e3d-4e7c-92d7-39767813457e")
I0805 13:50:36.994941 2023330 status_manager.go:564] Patch status for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" with "{\"metadata\":{\"uid\":\"c3c2a1b5-5e3d-4e7c-92d7-39767813457e\"},\"status\":{\"podIP\":null,\"podIPs\":null}}"
I0805 13:50:36.994968 2023330 status_manager.go:572] Status for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" updated successfully: (2, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-01 19:35:51 +0800 CST Reason: Message:} {Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-01 19:35:53 +0800 CST Reason: Message:} {Type:ContainersReady Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-01 19:35:53 +0800 CST Reason: Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-01 19:35:51 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:100.2.220.98 PodIP: PodIPs:[] StartTime:2022-08-01 19:35:51 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:container-0 State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2022-08-01 19:35:52 +0800 CST,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:100.2.216.234:30012/library/tomcat:8.5.39 ImageID:docker-pullable://100.2.216.234:30012/library/tomcat@sha256:96f9540f50bf96b48fdeb5aa490b71505b7e8ab11e12e1af126f551270c49998 ContainerID:docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6 Started:0xc000f4cd70}] QOSClass:Guaranteed EphemeralContainerStatuses:[]})
I0805 13:50:36.995176 2023330 kubelet.go:1927] SyncLoop (RECONCILE, "api"): "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:37.193889 2023330 manager.go:1044] Destroyed container: "/kubepods.slice/kubepods-podc3c2a1b5_5e3d_4e7c_92d7_39767813457e.slice/docker-7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6.scope" (aliases: [k8s_container-0_xxxx-job-cqog-z7ml2_lilf-bug_c3c2a1b5-5e3d-4e7c-92d7-39767813457e_0 7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6], namespace: "docker")
I0805 13:50:37.288582 2023330 kubelet.go:1921] SyncLoop (UPDATE, "api"): "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:37.875568 2023330 generic.go:155] GenericPLEG: c3c2a1b5-5e3d-4e7c-92d7-39767813457e/7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6: running -> exited
I0805 13:50:37.906792 2023330 kuberuntime_manager.go:958] getSandboxIDByPodUID got sandbox IDs ["6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb"] for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:37.910932 2023330 generic.go:386] PLEG: Write status for xxxx-job-cqog-z7ml2/lilf-bug: &container.PodStatus{ID:"c3c2a1b5-5e3d-4e7c-92d7-39767813457e", Name:"xxxx-job-cqog-z7ml2", Namespace:"lilf-bug", IPs:[]string{}, ContainerStatuses:[]*container.Status{(*container.Status)(0xc0011461c0)}, SandboxStatuses:[]*v1alpha2.PodSandboxStatus{(*v1alpha2.PodSandboxStatus)(0xc002601ce0)}} (err: <nil>)
I0805 13:50:37.911001 2023330 kubelet_pods.go:1482] Generating status for "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:37.911005 2023330 kubelet.go:1952] SyncLoop (PLEG): "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)", event: &pleg.PodLifecycleEvent{ID:"c3c2a1b5-5e3d-4e7c-92d7-39767813457e", Type:"ContainerDied", Data:"7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6"}
I0805 13:50:37.911024 2023330 kubelet_pods.go:1482] Generating status for "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:37.911033 2023330 helpers.go:85] Already ran container "container-0" of pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)", do nothing
I0805 13:50:37.911045 2023330 helpers.go:85] Already ran container "container-0" of pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)", do nothing
I0805 13:50:37.911164 2023330 kuberuntime_manager.go:457] No ready sandbox for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" can be found. Need to start a new one
I0805 13:50:37.911177 2023330 kuberuntime_manager.go:678] computePodActions got {KillPod:true CreateSandbox:false SandboxID:6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb Attempt:1 NextInitContainerToStart:nil ContainersToStart:[] ContainersToKill:map[] EphemeralContainersToStart:[]} for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:37.911208 2023330 kuberuntime_manager.go:696] Stopping PodSandbox for "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" because all other containers are dead.
E0805 13:50:37.911741 2023330 kuberuntime_manager.go:702] killPodWithSyncResult failed: failed to "KillPodSandbox" for "c3c2a1b5-5e3d-4e7c-92d7-39767813457e" with KillPodSandboxError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
E0805 13:50:37.911761 2023330 pod_workers.go:191] Error syncing pod c3c2a1b5-5e3d-4e7c-92d7-39767813457e ("xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"), skipping: failed to "KillPodSandbox" for "c3c2a1b5-5e3d-4e7c-92d7-39767813457e" with KillPodSandboxError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
I0805 13:50:38.608641 2023330 status_manager.go:564] Patch status for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" with "{\"metadata\":{\"uid\":\"c3c2a1b5-5e3d-4e7c-92d7-39767813457e\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"reason\":\"PodCompleted\",\"type\":\"Initialized\"},{\"lastTransitionTime\":\"2022-08-05T05:50:37Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2022-08-05T05:50:37Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6\",\"image\":\"100.2.216.234:30012/library/tomcat:8.5.39\",\"imageID\":\"docker-pullable://100.2.216.234:30012/library/tomcat@sha256:96f9540f50bf96b48fdeb5aa490b71505b7e8ab11e12e1af126f551270c49998\",\"lastState\":{},\"name\":\"container-0\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6\",\"exitCode\":0,\"finishedAt\":\"2022-08-05T05:50:37Z\",\"reason\":\"Completed\",\"startedAt\":\"2022-08-01T11:35:52Z\"}}}],\"phase\":\"Succeeded\"}}"
I0805 13:50:38.608672 2023330 status_manager.go:572] Status for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" updated successfully: (3, {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-01 19:35:51 +0800 CST Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-05 13:50:37 +0800 CST Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-05 13:50:37 +0800 CST Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-01 19:35:51 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:100.2.220.98 PodIP: PodIPs:[] StartTime:2022-08-01 19:35:51 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:container-0 State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2022-08-01 19:35:52 +0800 CST,FinishedAt:2022-08-05 13:50:37 +0800 CST,ContainerID:docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:100.2.216.234:30012/library/tomcat:8.5.39 ImageID:docker-pullable://100.2.216.234:30012/library/tomcat@sha256:96f9540f50bf96b48fdeb5aa490b71505b7e8ab11e12e1af126f551270c49998 ContainerID:docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6 Started:0xc001d89740}] QOSClass:Guaranteed EphemeralContainerStatuses:[]})
I0805 13:50:38.609496 2023330 kubelet.go:1927] SyncLoop (RECONCILE, "api"): "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:39.171569 2023330 desired_state_of_world_populator.go:293] Removing volume from desired state for volume "default-token-sm4mm" (UniqueName: "kubernetes.io/secret/c3c2a1b5-5e3d-4e7c-92d7-39767813457e-default-token-sm4mm") pod "xxxx-job-cqog-z7ml2" (UID: "c3c2a1b5-5e3d-4e7c-92d7-39767813457e")
I0805 13:50:39.243862 2023330 reconciler.go:196] operationExecutor.UnmountVolume started for volume "default-token-sm4mm" (UniqueName: "kubernetes.io/secret/c3c2a1b5-5e3d-4e7c-92d7-39767813457e-default-token-sm4mm") pod "c3c2a1b5-5e3d-4e7c-92d7-39767813457e" (UID: "c3c2a1b5-5e3d-4e7c-92d7-39767813457e")
I0805 13:50:39.243950 2023330 subpath_linux.go:226] Cleaning up subpath mounts for /var/lib/kubelet/pods/c3c2a1b5-5e3d-4e7c-92d7-39767813457e/volume-subpaths/default-token-sm4mm
I0805 13:50:39.244055 2023330 util.go:254] Tearing down volume default-token-sm4mm for pod c3c2a1b5-5e3d-4e7c-92d7-39767813457e at /var/lib/kubelet/pods/c3c2a1b5-5e3d-4e7c-92d7-39767813457e/volumes/kubernetes.io~secret/default-token-sm4mm
I0805 13:50:39.244273 2023330 empty_dir_linux.go:98] Statfs_t of /var/lib/kubelet/pods/c3c2a1b5-5e3d-4e7c-92d7-39767813457e/volumes/kubernetes.io~secret/default-token-sm4mm: {Type:16914836 Bsize:4096 Blocks:4089782 Bfree:4089779 Bavail:4089779 Files:4089782 Ffree:4089773 Fsid:{Val:[0 0]} Namelen:255 Frsize:4096 Flags:4128 Spare:[0 0 0 0]}
I0805 13:50:39.244311 2023330 mount_linux.go:262] Unmounting /var/lib/kubelet/pods/c3c2a1b5-5e3d-4e7c-92d7-39767813457e/volumes/kubernetes.io~secret/default-token-sm4mm
I0805 13:50:39.256738 2023330 operation_generator.go:797] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/c3c2a1b5-5e3d-4e7c-92d7-39767813457e-default-token-sm4mm" (OuterVolumeSpecName: "default-token-sm4mm") pod "c3c2a1b5-5e3d-4e7c-92d7-39767813457e" (UID: "c3c2a1b5-5e3d-4e7c-92d7-39767813457e"). InnerVolumeSpecName "default-token-sm4mm". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0805 13:50:39.344288 2023330 reconciler.go:319] Volume detached for volume "default-token-sm4mm" (UniqueName: "kubernetes.io/secret/c3c2a1b5-5e3d-4e7c-92d7-39767813457e-default-token-sm4mm") on node "node4" DevicePath ""
I0805 13:50:39.734011 2023330 fs.go:410] unable to determine file system type, partition mountpoint does not exist: /var/lib/kubelet/pods/c3c2a1b5-5e3d-4e7c-92d7-39767813457e/volumes/kubernetes.io~secret/default-token-sm4mm
I0805 13:50:40.027062 2023330 kubelet_pods.go:1482] Generating status for "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:40.027096 2023330 helpers.go:85] Already ran container "container-0" of pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)", do nothing
I0805 13:50:40.027191 2023330 status_manager.go:429] Ignoring same status for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)", status: {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-01 19:35:51 +0800 CST Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-05 13:50:37 +0800 CST Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-05 13:50:37 +0800 CST Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2022-08-01 19:35:51 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:100.2.220.98 PodIP: PodIPs:[] StartTime:2022-08-01 19:35:51 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:container-0 State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2022-08-01 19:35:52 +0800 CST,FinishedAt:2022-08-05 13:50:37 +0800 CST,ContainerID:docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:100.2.216.234:30012/library/tomcat:8.5.39 ImageID:docker-pullable://100.2.216.234:30012/library/tomcat@sha256:96f9540f50bf96b48fdeb5aa490b71505b7e8ab11e12e1af126f551270c49998 ContainerID:docker://7838305eeaa2038191a4ea21693bb930085fb56819c77c8af7d66588eaedc3b6 Started:0xc0021bff30}] QOSClass:Guaranteed EphemeralContainerStatuses:[]}
I0805 13:50:40.027268 2023330 kuberuntime_manager.go:457] No ready sandbox for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" can be found. Need to start a new one
I0805 13:50:40.027277 2023330 kuberuntime_manager.go:678] computePodActions got {KillPod:true CreateSandbox:false SandboxID:6deb27a50c72d291057650fa4612335b8e29c62fb3ee4f8cad0851beb82df6fb Attempt:1 NextInitContainerToStart:nil ContainersToStart:[] ContainersToKill:map[] EphemeralContainersToStart:[]} for pod "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)"
I0805 13:50:40.027297 2023330 kuberuntime_manager.go:696] Stopping PodSandbox for "xxxx-job-cqog-z7ml2_lilf-bug(c3c2a1b5-5e3d-4e7c-92d7-39767813457e)" because all other containers are dead.

The problem might be the issue https://github.com/kubernetes/kubernetes/issues/28486‘s special case. Is it necessary to deal with……

Kubernetes version

```console $ kubectl version # Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.10", GitCommit:"e390c178274bbb72be6f941b09a31a0ed9fb88db", GitTreeState:"clean", BuildDate:"2021-11-18T04:35:30Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"} # Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.10", GitCommit:"e390c178274bbb72be6f941b09a31a0ed9fb88db", GitTreeState:"clean", BuildDate:"2021-11-18T04:34:03Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"} ```

Cloud provider

none

OS version

```console # On Linux: $ cat /etc/os-release # centos 8.2 $ uname -a # Linux node1 4.18.0-240.10.1.kux.x86_64 #1 SMP Thu May 20 22:53:46 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux ```

Install tools

kubeadmin

Container runtime (CRI) and version (if applicable)

``` docker version Client: Version: 20.10.2 API version: 1.41 Go version: go1.14.7 Git commit: 411224736a Built: Sun Jul 4 02:59:27 2021 OS/Arch: linux/amd64 Context: default Experimental: true Server: Engine: Version: 20.10.2 API version: 1.41 (minimum version 1.12) Go version: go1.14.7 Git commit: 411224736a Built: Sun Jul 4 02:51:50 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.3 GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b runc: Version: 1.0.0-rc93 GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec docker-init: Version: 0.19.0 GitCommit: de40ad0 ```

Related plugins (CNI, CSI, ...) and versions (if applicable)

none
k8s-ci-robot commented 2 years ago

@lilongfeng0902: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
lilongfeng0902 commented 2 years ago

/kind bug /sig apps /area workload-api/job

lilongfeng0902 commented 2 years ago

Maybe my question is unreasonable…… Thanks.