Add new members to an existing cluster one by one.

CyberDem0n commented 8 years ago

Before sending add member command to an existing cluster we should check that there is nobody already in process of adding itself to the cluster.

Basically this is just workaround for the following problem: https://github.com/zalando/stups-etcd-cluster/issues/1

New members were added successfully but etcd failed to start due to version incompatibility and we lost quorum.

feikesteenbergen commented 8 years ago

Manual regression test, to test failure scenario Network Partition:

3 node cluster, status:

cluster is healthy
member af90e28ea0bc9d03 is healthy
member e32468c722a28bd4 is healthy
member ff4e0f5a2f7eb641 is unhealthy

Add new node, by bumping Auto Scaling Group from 3 to 4:

The cluster adds a member, and becomes unhealthy for a short while, after that, the cluster is healthy again:

cluster is healthyh
member 231a86654ad1f632 is healthy
member af90e28ea0bc9d03 is healthy
member e32468c722a28bd4 is healthy
member ff4e0f5a2f7eb641 is unhealthy

My current hypothesis of this happening is:

3 node cluster, quorum=2, 2 healthy members
1 new member is introduced
4 node cluster, quorum=3, 2 healthy members
unhealthy cluster
new member goes online
4 node cluster, quorum=3, 3 healthy members

feikesteenbergen commented 8 years ago

Tested again, by doing the following:

3 member ASG running version 2.0
increased members from 3 to 6
cluster remained healthy, with at most unhealthy member
other nodes reporting: Member (id=abc peerURLs=['http://:2380']) is registered but not yet joined
Therefore, the 4 member cluster (quorum=3) never ran out of quorum
This should therefore fix issue #1

feikesteenbergen commented 8 years ago

+1

zalando-stups / stups-etcd-cluster

Add new members to an existing cluster one by one. #2