hashicorp / raft-autopilot

Raft Autopilot
Mozilla Public License 2.0
21 stars 5 forks source link

Demote failed servers first during reconciliation #55

Open kubawi opened 1 month ago

kubawi commented 1 month ago

In Vault, we've encountered an issue where losing a voter, then a non-voter in the same redundancy zone, in short succession, would lead to the cluster becoming unhealthy (losing an active node).

In the scenario above, the following would happen before the change in this PR (write-up stolen from @banks):