Closed ncabatoff closed 1 year ago
It really doesn't make much sense to me that new node has to read/decrypt the whole snapshot to get the raft configuration.
Is there anyway to directly query the quorum and get raft leader without downloading the snapshot?
I think https://github.com/hashicorp/raft-autopilot/pull/23 should've addressed this issue, so I'm going to close this. Feel free to reopen if I'm mistaken.
Autopilot stabilization ensures that new nodes, even ones that are destined to become voters, always start as non-voters until they've been seen to stay current and in contact for the stabilization period.
Autopilot dead server pruning only respects min_quorum for voter nodes. This means that a new voting node that's starting up, which hasn't yet been deemed stable and promoted to a voter, can be pruned by autopilot before it gets a chance to stabilize. This is especially a concern in Vault, as we determine a node to be dead if it isn't sending us heartbeats (not raft heartbeats, a different Vault-specific kind), but those heartbeats won't happen on a newly joined node that's still applying the initial snapshot, because the address to send the heartbeats to is recorded in storage, i.e. lies within that snapshot.
In discussing this with @mkeeler , he proposed that min quorum should prevent removal of too many non-voters when autopilot knows that we desire them to be voters. So persistent non-voters (read replicas) can always be pruned. Outside of that non-voters can be pruned so long as the remaining set of potential voters would be able to satisfy the min quorum constraints.