Closed tnozicka closed 6 years ago
P0 after the associated issue https://github.com/openshift/origin/issues/16870
Uh oh
fyi @derekwaynecarr
might be related to our old friend https://bugzilla.redhat.com/show_bug.cgi?id=1486356
I was unable to reproduce this on kubernetes release-1.7 branch or v1.7.6.
Here are the logs where things go sideways:
I1023 22:47:36.615158 20104 generic.go:183] GenericPLEG: Relisting
I1023 22:47:36.618081 20104 generic.go:147] GenericPLEG: 0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa/54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649: running -> exited
I1023 22:47:36.618102 20104 generic.go:147] GenericPLEG: 0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa/48679b5a5ac8ba5f584b20f11cdb12742f0058ee890134454b669437cb64c95a: running -> exited
I1023 22:47:36.619351 20104 kuberuntime_manager.go:828] getSandboxIDByPodUID got sandbox IDs ["48679b5a5ac8ba5f584b20f11cdb12742f0058ee890134454b669437cb64c95a"] for pod "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)"
I1023 22:47:36.625538 20104 generic.go:381] PLEG: Write status for test/default: &container.PodStatus{ID:"0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa", Name:"test", Namespace:"default", IP:"", ContainerStatuses:[]*container.ContainerStatus{(*container.ContainerStatus)(0xc428c1b0a0)}, SandboxStatuses:[]*runtime.PodSandboxStatus{(*runtime.PodSandboxStatus)(0xc42af894f0)}} (err: <nil>)
I1023 22:47:36.625592 20104 kubelet.go:1888] SyncLoop (PLEG): "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)", event: &pleg.PodLifecycleEvent{ID:"0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa", Type:"ContainerDied", Data:"54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649"}
I1023 22:47:36.625624 20104 kubelet_pods.go:1224] Generating status for "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)"
I1023 22:47:36.625646 20104 helpers.go:84] Already ran container "test-container" of pod "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)", do nothing
I1023 22:47:36.625681 20104 kubelet.go:1888] SyncLoop (PLEG): "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)", event: &pleg.PodLifecycleEvent{ID:"0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa", Type:"ContainerDied", Data:"48679b5a5ac8ba5f584b20f11cdb12742f0058ee890134454b669437cb64c95a"}
I1023 22:47:36.625703 20104 kubelet_pods.go:1224] Generating status for "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)"
I1023 22:47:36.625708 20104 kubelet_pods.go:1224] Generating status for "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)"
I1023 22:47:36.625716 20104 helpers.go:84] Already ran container "test-container" of pod "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)", do nothing
I1023 22:47:36.625734 20104 helpers.go:84] Already ran container "test-container" of pod "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)", do nothing
I1023 22:47:36.625744 20104 kuberuntime_container.go:798] Removing container "54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649"
W1023 22:47:36.625736 20104 pod_container_deletor.go:77] Container "48679b5a5ac8ba5f584b20f11cdb12742f0058ee890134454b669437cb64c95a" not found in pod's containers
I1023 22:47:36.625797 20104 status_manager.go:340] Status Manager: adding pod: "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa", with status: ('\x04', {Succeeded [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:29 -0500 CDT PodCompleted } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:36 -0500 CDT PodCompleted } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:29 -0500 CDT }] 10.42.10.23 172.17.0.2 2017-10-23 22:47:29 -0500 CDT [] [{test-container {nil nil &ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2017-10-23 22:47:30 -0500 CDT,FinishedAt:2017-10-23 22:47:35 -0500 CDT,ContainerID:docker://54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649,}} {nil nil nil} false 0 docker.io/centos:centos7 docker-pullable://docker.io/centos@sha256:eba772bac22c86d7d6e72421b4700c3f894ab6e35475a34014ff8de74c10872e docker://54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649}] BestEffort}) to podStatusChannel
I1023 22:47:36.625901 20104 status_manager.go:147] Status Manager: syncing pod: "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa", with status: (4, {Succeeded [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:29 -0500 CDT PodCompleted } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:36 -0500 CDT PodCompleted } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:29 -0500 CDT }] 10.42.10.23 172.17.0.2 2017-10-23 22:47:29 -0500 CDT [] [{test-container {nil nil &ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2017-10-23 22:47:30 -0500 CDT,FinishedAt:2017-10-23 22:47:35 -0500 CDT,ContainerID:docker://54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649,}} {nil nil nil} false 0 docker.io/centos:centos7 docker-pullable://docker.io/centos@sha256:eba772bac22c86d7d6e72421b4700c3f894ab6e35475a34014ff8de74c10872e docker://54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649}] BestEffort}) from podStatusChannel
I1023 22:47:36.630263 20104 qos_container_manager_linux.go:286] [ContainerManager]: Updated QoS cgroup configuration
I1023 22:47:36.630618 20104 wrap.go:42] PUT /api/v1/namespaces/default/pods/test/status: (2.126702ms) 200 [[openshift/v1.7.6+a08f5eeb62 (linux/amd64) kubernetes/c84beff] 10.42.10.23:38370]
I1023 22:47:36.630765 20104 config.go:293] Setting pods for source api
I1023 22:47:36.630909 20104 graph_populator.go:63] updatePod default/test, node unchanged
I1023 22:47:36.630960 20104 graph_builder.go:475] GraphBuilder process object: v1/Pod, namespace default, name test, uid 0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa, event type update
I1023 22:47:36.630975 20104 kubelet.go:1867] SyncLoop (RECONCILE, "api"): "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)"
I1023 22:47:36.630911 20104 status_manager.go:453] Status for pod "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)" updated successfully: (4, {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-23 22:47:29 -0500 CDT Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-23 22:47:36 -0500 CDT Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-23 22:47:29 -0500 CDT Reason: Message:}] Message: Reason: HostIP:10.42.10.23 PodIP:172.17.0.2 StartTime:2017-10-23 22:47:29 -0500 CDT InitContainerStatuses:[] ContainerStatuses:[{Name:test-container State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2017-10-23 22:47:30 -0500 CDT,FinishedAt:2017-10-23 22:47:35 -0500 CDT,ContainerID:docker://54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/centos:centos7 ImageID:docker-pullable://docker.io/centos@sha256:eba772bac22c86d7d6e72421b4700c3f894ab6e35475a34014ff8de74c10872e ContainerID:docker://54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649}] QOSClass:BestEffort})
I1023 22:47:36.631032 20104 kubelet_pods.go:788] Pod "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)" is terminated, but some containers have not been cleaned up: [0xc428c1b0a0]
I1023 22:47:36.631008 20104 disruption.go:349] updatePod called on pod "test"
I1023 22:47:36.631060 20104 disruption.go:412] No PodDisruptionBudgets found for pod test, PodDisruptionBudget controller will avoid syncing.
I1023 22:47:36.631066 20104 disruption.go:352] No matching pdb for pod "test"
I1023 22:47:36.717634 20104 desired_state_of_world_populator.go:243] Removing volume from desired state for volume "default-token-p6877" (UniqueName: "kubernetes.io/secret/0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa-default-token-p6877") pod "test" (UID: "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa")
I1023 22:47:36.815551 20104 reconciler.go:186] operationExecutor.UnmountVolume started for volume "default-token-p6877" (UniqueName: "kubernetes.io/secret/0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa-default-token-p6877") pod "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa" (UID: "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa")
I1023 22:47:36.815623 20104 util.go:367] Tearing down volume default-token-p6877 for pod 0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa at /home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift.local.volumes/pods/0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa/volumes/kubernetes.io~secret/default-token-p6877
I1023 22:47:36.815805 20104 empty_dir_linux.go:38] Determining mount medium of /home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift.local.volumes/pods/0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa/volumes/kubernetes.io~secret/default-token-p6877
I1023 22:47:36.815845 20104 empty_dir_linux.go:48] Statfs_t of /home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift.local.volumes/pods/0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa/volumes/kubernetes.io~secret/default-token-p6877: {Type:16914836 Bsize:4096 Blocks:2055680 Bfree:2055676 Bavail:2055676 Files:2055680 Ffree:2055669 Fsid:{X__val:[0 0]} Namelen:255 Frsize:4096 Flags:4128 Spare:[0 0 0 0]}
I1023 22:47:36.815894 20104 mount_linux.go:224] Unmounting /home/sjennings/projects/go/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift.local.volumes/pods/0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa/volumes/kubernetes.io~secret/default-token-p6877
I1023 22:47:36.822537 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:36.850243 20104 operation_generator.go:532] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa-default-token-p6877" (OuterVolumeSpecName: "default-token-p6877") pod "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa" (UID: "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa"). InnerVolumeSpecName "default-token-p6877". PluginName "kubernetes.io/secret", VolumeGidValue ""
I1023 22:47:36.915745 20104 reconciler.go:290] Volume detached for volume "default-token-p6877" (UniqueName: "kubernetes.io/secret/0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa-default-token-p6877") on node "cerebellum.local.variantweb.net" DevicePath ""
I1023 22:47:36.922863 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:37.023021 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:37.123194 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:37.223460 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:37.323739 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:37.424005 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:37.496783 20104 kubelet.go:1930] SyncLoop (housekeeping)
I1023 22:47:37.500762 20104 kubelet_pods.go:973] Killing unwanted pod "test"
E1023 22:47:37.502268 20104 remote_runtime.go:277] ContainerStatus "54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649" from runtime service failed: rpc error: code = 2 desc = Error: No such container: 54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649
I1023 22:47:37.503210 20104 kubelet_pods.go:1638] Orphaned pod "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa" found, removing pod cgroups
E1023 22:47:37.505515 20104 kubelet_pods.go:976] Failed killing the pod "test": failed to "KillContainer" for "test-container" with KillContainerError: "rpc error: code = 2 desc = Error: No such container: 54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649"
I1023 22:47:37.507552 20104 manager.go:988] Destroyed container: "/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod0f3cdbbe_b86e_11e7_891d_1c1b0deeddfa.slice" (aliases: [], namespace: "")
I1023 22:47:37.507584 20104 handler.go:325] Added event &{/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod0f3cdbbe_b86e_11e7_891d_1c1b0deeddfa.slice 2017-10-23 22:47:37.507574826 -0500 CDT containerDeletion {<nil>}}
I1023 22:47:37.524257 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:37.624424 20104 desired_state_of_world_populator.go:160] Skipping findAndRemoveDeletedPods(). Not permitted until 2017-10-23 22:47:38.717614742 -0500 CDT (getPodStatusRetryDuration 2s).
I1023 22:47:37.625703 20104 generic.go:183] GenericPLEG: Relisting
I1023 22:47:37.628355 20104 generic.go:147] GenericPLEG: 0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa/54b38d55b7505e6130f75aae23b6e65e608566f44f6b707866a0dbcb003db649: exited -> non-existent
I1023 22:47:37.629628 20104 kuberuntime_manager.go:828] getSandboxIDByPodUID got sandbox IDs ["48679b5a5ac8ba5f584b20f11cdb12742f0058ee890134454b669437cb64c95a"] for pod "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)"
I1023 22:47:37.632287 20104 generic.go:381] PLEG: Write status for test/default: &container.PodStatus{ID:"0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa", Name:"test", Namespace:"default", IP:"", ContainerStatuses:[]*container.ContainerStatus{}, SandboxStatuses:[]*runtime.PodSandboxStatus{(*runtime.PodSandboxStatus)(0xc425959b80)}} (err: <nil>)
I1023 22:47:37.632348 20104 kubelet_pods.go:1224] Generating status for "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)"
I1023 22:47:37.632386 20104 kubelet_pods.go:1189] pod waiting > 0, pending
I1023 22:47:37.632460 20104 status_manager.go:340] Status Manager: adding pod: "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa", with status: ('\x05', {Pending [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:29 -0500 CDT } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:36 -0500 CDT ContainersNotReady containers with unready status: [test-container]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:29 -0500 CDT }] 10.42.10.23 172.17.0.2 2017-10-23 22:47:29 -0500 CDT [] [{test-container {&ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 centos:centos7 }] BestEffort}) to podStatusChannel
I1023 22:47:37.632524 20104 status_manager.go:147] Status Manager: syncing pod: "0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa", with status: (5, {Pending [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:29 -0500 CDT } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:36 -0500 CDT ContainersNotReady containers with unready status: [test-container]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-10-23 22:47:29 -0500 CDT }] 10.42.10.23 172.17.0.2 2017-10-23 22:47:29 -0500 CDT [] [{test-container {&ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 centos:centos7 }] BestEffort}) from podStatusChannel
I1023 22:47:37.637291 20104 wrap.go:42] PUT /api/v1/namespaces/default/pods/test/status: (2.072871ms) 200 [[openshift/v1.7.6+a08f5eeb62 (linux/amd64) kubernetes/c84beff] 10.42.10.23:38370]
I1023 22:47:37.637404 20104 config.go:293] Setting pods for source api
I1023 22:47:37.637532 20104 graph_populator.go:63] updatePod default/test, node unchanged
I1023 22:47:37.637633 20104 kubelet.go:1867] SyncLoop (RECONCILE, "api"): "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)"
I1023 22:47:37.637568 20104 status_manager.go:453] Status for pod "test_default(0f3cdbbe-b86e-11e7-891d-1c1b0deeddfa)" updated successfully: (5, {Phase:Pending Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-23 22:47:29 -0500 CDT Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-23 22:47:36 -0500 CDT Reason:ContainersNotReady Message:containers with unready status: [test-container]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-23 22:47:29 -0500 CDT Reason: Message:}] Message: Reason: HostIP:10.42.10.23 PodIP:172.17.0.2 StartTime:2017-10-23 22:47:29 -0500 CDT InitContainerStatuses:[] ContainerStatuses:[{Name:test-container State:{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:centos:centos7 ImageID: ContainerID:}] QOSClass:BestEffort})
PLEG picks up on the container moving from running -> exited and the pod is marked as Succeeded
.
However, then we failed to get the container status because it doesn't exist anymore, and the pod is seen as orphaned. PLEG goes from exited -> non-existent and a new status is created for the pod starting an Pending
.
cc @joelsmith @frobware
This is caused by https://github.com/openshift/origin/pull/16913 backported from kube 1.8. This is reproducible in kube release-1.8 upstream.
Opened upstream issue https://github.com/kubernetes/kubernetes/issues/54499. We'll see if they want to revert or fix. Hopefully we'll get a quick response.
for reference this still seems broken https://github.com/openshift/origin/issues/17595 just now it is Failed -> Succeeded
Pod status.phase shall not transition from
Succeeded
toPending
. (Note that this is dependent on timing of the deletion.)I think this is the reason we are seeing multiple deployer pods running for DeploymentConfigs.
Version
Steps To Reproduce
Current Result
Pod transitions from
Succeeded
toPending
after being deleted.Expected Result
Once Pod reaches phase
Succeeded
it stays there.Additional Information
$ cat ~/tmp/pod/pod.yaml
$ oc observe pods --output=gotemplate -a '{{.}}'