Give the ability for the weave-npc to see if a removed node still has a peer in the ring. If so, remove the peer before reclaiming IPs

mmerrill3 commented 4 years ago

What you expected to happen?

What happened?

How to reproduce it?

Anything else we need to know?

Kops 1.16.beta1, kubernetes 1.16.6

Versions:

$ weave version
2.6.0
$ docker version
18.9.9
$ uname -a
Ubuntu Stretch 9.11
$ kubectl version
1.16.6

Logs:

$ docker logs weave

or, if using Kubernetes:

$ kubectl logs -n kube-system <weave-net-pod> weave

Network:

$ ip route
$ ip -4 -o addr
$ sudo iptables-save

Hi, we came across a moment of split brain weirdness when doing a kops rolling update on our cluster. Kops ignores daemon sets when nodes are drained, so the weave-npc and weave containers still are running after the node is cordoned and drained. Then, the node is deleted from the cluster. This triggers another weave member to reclaim the IPs from the weave that is running on the deleted node. But, the weave pod is still running.... It will run until kops deletes the instance from EC2.

This is a request to have the "winning" weave member (the one that is about to reclaim IPs) to check to see if the losing weave member is still in the ring. If so, it should be removed from the ring forcefully.

This is the message I saw from the "losing" weave member since our ELK stack was still running on the deleted node sending logs to Elastic:

INFO: 2020/02/18 21:32:31.921363 ->[10.201.23.68:10475|82:99:76:ba:e6:15(ip-10-201-23-68.ec2.internal)]: connection shutting down due to error: Received update for IP range I own at 100.111.192.0 v318: incoming message says owner 12:54:9a:b8:be:e5 v418

The owner 12:54:9a:b8:be:e5 was the "winner" of the lottery for who gets to own the IPs from the deleted node.

In the meantime, we will not ignore daemonsets when nodes are cordoned and drained.

bboreham commented 4 years ago

I think it's a bug in Kops to delete the node from the cluster while pods are still running on it.

However I did make a note that we could check the peer is unreachable before removing it. I think this would avoid the situation you saw.

A downside is that the remaining nodes will go round this check again and again until the node actually does disappear. Do you have a feel for how long this situation is likely to persist?

mmerrill3 commented 4 years ago

I'm looking to remedy this in kops through this issue: https://github.com/kubernetes/kops/issues/8391

I'll give myself the option to not ignore daemonsets on a rolling update.

The situation will be in play as long as it takes kops to terminate the node from EC2 directly. If that hangs, or the node doesn't terminate in a timely manner, I guess we could be in this situation for a while, at least until the node is deleted.

weaveworks / weave