Weave-kube should affect node readiness

mikebryant commented 7 years ago

We've just had an issue where one of the weave-net pods, only on one node, was in CrashLoopBackOff

This should have an impact on the node Ready state, or apply a taint. (Or something else?) As is, all the pods on that node stopped working. What was worse, because they were still reported as ready by the kubelet, as all the internal healthchecks worked, they were still included in kube-proxy load balancing, but weren't reachable from other nodes

This meant things like dns lookups were intermittent, as one of the kube-dns pods was on the broken node.

bboreham commented 7 years ago

I agree we should do better here.

If the real problem is that the Weave Net pod is crashing, then we need collusion from Kubernetes to set state to not-ready until we set it to ready. This was discussed extensively at https://github.com/kubernetes/kubernetes/pull/36209 and https://github.com/kubernetes/kubernetes/pull/34398, but did not reach agreement.

caarlos0 commented 7 years ago

weaveworks / weave

Weave-kube should affect node readiness #2896