Pod status Unknown after recreate

jhunt / k8s-boshrelease

A BOSH Release for deploying Kubernetes clusters

MIT License

13 stars 9 forks source link

Pod status Unknown after recreate #33

Closed tpoland closed 4 years ago

tpoland commented 4 years ago

After running bosh with the recreate directive for single VM k8s cluster the coredns pods and any pods deployed outside of the kube-system namespace have a status of "Unknown". Describing the pod shows a Stats of "Running" with a Container state of "Terminated" and an Event message status of "Pod sandbox changed, it will be killed and re-created.".

Terminating the affected pods results in a recovery of those pods.

tpoland commented 4 years ago

Upgrading the bosh deployment from a 1-node to a 2-node cluster also resulted in the same pod set failing.

Using a two-node k8s cluster fewer pods end up in an "Unknown" state, but the issue is still present.

tpoland commented 4 years ago

Upgrading from a 2-node to 3-node k8s cluster does not result in any pod failures. Issuing a recreate does still result in a small subset of pod failures.

jhunt commented 4 years ago

Yeah, I think we need to start cordoning and draining nodes in a pre-stop. Curious how that will effect tinynetes, and whether or not we'll have to make that configurable...

jhunt commented 4 years ago

I think this is handled in #34, with a drain script. Take a look, and see what you think.

jhunt commented 4 years ago

The fixes in PR #34 will be available in the next (1.18.x) release.