Closed AbeOwlu closed 2 weeks ago
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/sig Node
@AbeOwlu , to reproduce this issue, how to remove the IP assigned to a pod externally and force node IPAM to re-sync? Is this an issue with AWS VPC CNI?
/triage needs-information
HI @AnishShah , thanks for looking into this...
and you're accurate this was initially seen on aws cni, and this issue was raised with the cni
was testing this on calico cni with calicoctl ipam release --force
and it may be a similar state... should confirm this and update with more information soon.
from checking the containerd logs, it does seem the CRI attempts to tear down the container sandbox and recreate it, but the CNI does not responde in the aws cni scenario. So the container orchestrator may actually be handling this case as expected.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
sig-node triage meeting:
@AbeOwlu what state the pod is in? can you share the output of kubectl describe pods
? Also can you share kubelet and containerd logs to debug further?
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What happened?
pod(container) Readiness and Liveness probe are non-blocking routines. And if readiness probe is failing, a liveness probe can trigger restart and possibly self-heal.
However, encountered a case where;
coredns pod starts, but an external automation causes IP removal on node. the cni IPAM is forced to sync the resource state and the coredns pod network ns is torn down and rebuilt - container ID change, but pod remains ID unchanged
no startUp probe in coredns, so container ready, and doProbe readiness probe is sent
this http probe fails with a http status code 503, aand a liveness probe is never issued and self-heal/restart triggered
It is just unclear why the liveness probe on coredns spec is never sent is the getWorker here to UpdatePodStatus after checking startup probe not introducing an inadvertent wait for readiness?
What did you expect to happen?
How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)