Open aep opened 7 years ago
Hi @aep if you lose 3 out of 5 servers, quorum will be lost and Consul can no longer make changes to the cluster on its own. At that point you will need to perform the steps in https://www.consul.io/docs/guides/outage.html#failure-of-multiple-servers-in-a-multi-server-cluster in order to recover the cluster manually, and introduce the 3 new servers into the configuration.
I should also point to https://www.consul.io/docs/guides/autopilot.html#dead-server-cleanup, which will remove dead servers as new ones are added to replace them. For a normal ASG-like setup with automatic replacement, this should handle even unclean server fails automatically.
@slackpad that was what I was reporting. That option was essentially what I thought would cover my use case. (Asg) But it has no effect. I don't understand why loss of quorum is a special case and why the option is disabled under those conditions.
With Raft all of the configuration changes to the quorum (adding and removing servers) go through the Raft protocol itself, so if that's unable to make any changes because of an outage due to loss of quorum, there's not a good way to have the servers recover on their own. We have considered some options where we might have some special operator APIs and CLIs to allow you to do the peers.json type recovery without actually shutting down Consul and placing those files, but it was cumbersome to build and use, given that the cluster is in an outage state. We may revisit that in the future, but it's not currently planned.
With the latest version of Consul w/Autopilot and an ASG, Consul will keep things clean in the face of failures as long as you don't lose quorum, so you'd want to have enough servers to cover the number of simultaneous failures you want to be able to handle (3 servers can handle 1 failure, 5 can handle 2, etc.).
Thanks for the explanation. It would be helpful if the docs stated that right under the dead server cleanup option.
Something like "This is only possible when less than a quorum-loss amount of servers left, otherwise the cluster will be in outage state [link] where it does not remove dead nodes"
I'm not sure but for the outage state, wouldn't it be possible to just rediscover the memberlist from -retry-join-ec2 ? This would yield the same result on all nodes and should be fairly simple to implement.
I'll kick this open to track updating the docs!
I'm not sure but for the outage state, wouldn't it be possible to just rediscover the memberlist from -retry-join-ec2 ? This would yield the same result on all nodes and should be fairly simple to implement.
We could add some tooling for operators to do something like this, and we've looked at it, but it gets tricky since you don't have a quorum you need the tool to contact each server and perform the change at the same time, so it's hard to craft that into a robust experience.
"since you don't have a quorum you need the tool to contact each server and"
not sure why. ec2 api guarantees consistency, so each server could simply assume this is the truth and proceed without asking anyone else.
not sure why. ec2 api guarantees consistency, so each server could simply assume this is the truth and proceed without asking anyone else.
You'd still need to initiate this operation across all the servers (or have them somehow coordinate to kick it off after an outage). You wouldn't want some subset of servers running this on their own and splitting off into a separate cluster since they got partitioned while some of the other servers were waiting and still part of the old cluster, for example.
consul version
for both Client and ServerBoth Consul v0.8.4
consul info
for both Client and ServerOperating system and Environment details
official docker image 37ffadd9b8a6
Description of the Issue (and unexpected/desired result)
loss of quorum cannot be recovered unless the replacement instances have the same ip.
Reproduction steps
force-leave has no effect, neither does CleanupDeadServers = true change anything.