Closed dcorbacho closed 8 years ago
As in #944, partial partitions cause the coexistence of several masters in the same cluster. When the nodes get reconnected, the master exchange messages with existing slaves - expecting them to be newly started slaves - but those have just been synchronised or received messages from other master. Thus, message queues get out of sync and status do not match.
This requires an enhanced consensus algorithm to avoid the root cause.
To make it clear, there are plans to at least evaluate Raft in a few places after the 3.7.0
release.
Found while testing #944, using HA queues and autoheal (same testing as for #914).