Closed tpoland closed 4 years ago
Upgrading the bosh deployment from a 1-node to a 2-node cluster also resulted in the same pod set failing.
Using a two-node k8s cluster fewer pods end up in an "Unknown" state, but the issue is still present.
Upgrading from a 2-node to 3-node k8s cluster does not result in any pod failures. Issuing a recreate
does still result in a small subset of pod failures.
Yeah, I think we need to start cordoning and draining nodes in a pre-stop. Curious how that will effect tinynetes, and whether or not we'll have to make that configurable...
I think this is handled in #34, with a drain script. Take a look, and see what you think.
The fixes in PR #34 will be available in the next (1.18.x) release.
After running
bosh
with therecreate
directive for single VM k8s cluster the coredns pods and any pods deployed outside of thekube-system
namespace have a status of "Unknown". Describing the pod shows a Stats of "Running" with a Container state of "Terminated" and an Event message status of "Pod sandbox changed, it will be killed and re-created.".Terminating the affected pods results in a recovery of those pods.