Open mikebryant opened 7 years ago
I agree we should do better here.
If the real problem is that the Weave Net pod is crashing, then we need collusion from Kubernetes to set state to not-ready until we set it to ready. This was discussed extensively at https://github.com/kubernetes/kubernetes/pull/36209 and https://github.com/kubernetes/kubernetes/pull/34398, but did not reach agreement.
may be related -> https://github.com/weaveworks/weave/issues/3118
We've just had an issue where one of the weave-net pods, only on one node, was in
CrashLoopBackOff
This should have an impact on the node
Ready
state, or apply a taint. (Or something else?) As is, all the pods on that node stopped working. What was worse, because they were still reported as ready by the kubelet, as all the internal healthchecks worked, they were still included in kube-proxy load balancing, but weren't reachable from other nodesThis meant things like dns lookups were intermittent, as one of the kube-dns pods was on the broken node.