Open Huqicheng opened 3 months ago
We can prevent this from happening by enabling FLAGS_durable_wal_write. (Might degrade the perf, need test) Without this gflag enabled, the peer should be able to detect an op that it has received in a previous term, should reject and surface it to the leader.
Also, should disallow RBS from this peer.
Jira Link: DB-12375
Description
If majority of the tablet peers go down at the same time, then there are cases when RAFT can run into split brain situation, leading to undereplicated tablets (which blocks load balancer activity). Even if that happens, RAFT should ensure there's no ambiguity and resolve the leadership changes.
Consider an RF-3 configuration with: N1 (Leader), N2 (follower 1) and N3 (follower 2). If majority nodes crash and come back up, If N2 (follower 1) becomes the new leader, then N1 (old leader) can accept updates from the N2(new leader) even though their WALs are diverged. This can cause the RocksDB on Node N1 (old leader) to potentially become out of sync with the WAL on N1, thus is it not safe to let N1 continue to be part of the Quorum. It needs to be removed from the quorum. More details below:
The problems mentioned in step (5) can be avoided.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information