jgroups-extras / jgroups-raft

Implementation of the RAFT consensus protocol in JGroups
https://jgroups-extras.github.io/jgroups-raft/
Apache License 2.0
266 stars 84 forks source link

Leader left before the coordinator apply the election result #280

Closed yfei-z closed 2 months ago

yfei-z commented 5 months ago

Regarding https://github.com/jgroups-extras/jgroups-raft/issues/259, my test case still fails in 1.0.13.

    @Test
    void bug2(@TempDir(cleanup = CleanupMode.ALWAYS) Path tmp) throws Exception {
        String clusterName = "node-cluster", dir = tmp.toString();

        List<RaftNode> nodes = raftChannels("A,B,C", dir).stream().map(RaftNode::new).toList();
        try (var a = nodes.get(0); var b = nodes.get(1); var c = nodes.get(2)) {
            a.getCh().connect(clusterName); a.untilCoord(3);
            b.getCh().connect(clusterName); assertEquals("A", b.untilLeader(3));
            c.getCh().connect(clusterName); assertEquals("A", c.untilLeader(3));

            a.setAsync("cmd1".getBytes()).join();
            c.getCh().disconnect();
            a.setAsync("cmd2".getBytes()).join();

            assertEquals(2, a.lastAppended());
            assertEquals(2, b.lastAppended());
            assertEquals(1, c.lastAppended());
        }

        nodes = raftChannels("A,B,C", dir, t -> configProtocol(t, "raft.ELECTION", ELECTION.class))
                .stream().map(RaftNode::new).toList();
        try (var a = nodes.get(0); var b = nodes.get(1); var c = nodes.get(2)) {

            ELECTION election = c.getCh().stack().findProtocol(ELECTION.class);
            election.getSendLeader().blockInterruptibly(); // block the election thread before apply the elected leader

            c.getCh().connect(clusterName); c.untilCoord(3); // C become the coordinator
            b.getCh().connect(clusterName); // C: existing view: [C|0] (1) [C], new view: [C|1] (2) [C, B], result: reached
            a.getCh().connect(clusterName); // C: existing view: [C|1] (2) [C, B], new view: [C|2] (3) [C, B, A], result: no_change
            c.untilMembers(3, 3); assertEquals(List.of("C", "B", "A"), c.members()); // all up

            election.getSendLeader().untilWaiters(1, 3); // election thread has being blocked
            b.getCh().disconnect(); // C: existing view: [C|2] (3) [C, B, A], new view: [C|3] (2) [C, A], result: no_change

            c.untilMembers(2, 3); // view has being updated in C
            assertEquals(List.of("C", "A"), c.members());
            election.getSendLeader().unblock(); // unblock the election thread

            for (var t : List.of(c, a)) {
                assertEquals("B", t.untilLeader(3)); // TODO leader is a disconnected node
                assertThrows(RaftLeaderException.class, () -> t.setAsync("cmd3".getBytes())).printStackTrace();
            }
        }
    }

I found the fixing code where under reached and leader_lost cases which stop the existing voting thread before start a new one, but in my test case it's no_change.

jabolina commented 5 months ago

The test https://github.com/jgroups-extras/jgroups-raft/blob/6f3a9ee37fd30f31527c89a7ecc0f50240ad8061/tests/junit-functional/org/jgroups/tests/election/ViewChangeElectionTest.java#L55 was supposed to cover this scenario. I'll take a look today and get back. Thanks for reporting!

yfei-z commented 5 months ago

We have different blocking points, I block the election thread just before raft.setLeaderAndTerm(leader, new_term);, and you block the message sending where after the leader has been applied locally.

jabolina commented 5 months ago

I've looked into it and was trying to find a solution. The fact the block is the setLeaderAndTerm makes the difference. Likely, the solution will (as you commented on the other issue) serialize the handling of view updates. Otherwise, we calculate the effect of the view change without a final result in place. In this example, if the voting thread is pending, we have no_change, but if the thread has finished, it is leader_lost..

Since we can't delay the view processing, we can chain a CompletableFuture with a dedicated single-threaded executor for handling the events in order. We'll need some updates on the voting process to complete the future and notify the chain. I would avoid having a thread polling a queue since view changes are not supposed to occur frequently. Changes on the test side are necessary when checking if the voting thread is running since it would run asynchronously and not after processing the view.

yfei-z commented 5 months ago

I think just make sure the voting thread has processed the latest view before it's stopped, and the view change event thread don't try to stop the voting thread because there is no guarantee. I just made a pull request, hope it will give you some idea.

yfei-z commented 5 months ago

Since unset the leader only depend on the view change event of lost majority, so if a participant node got the delayed elected leader message after the view change event then it will keep the leader without majority reached. After the PR, the coordinator works fine but didn't cover this scenario.

@Test
void bug4() throws Exception {
    String clusterName = "node-cluster";
    List<RaftNode> nodes = raftChannels("A,B,C,D,E", null, t -> configProtocol(t, "raft.ELECTION", ELECTION.class))
            .stream().map(RaftNode::new).toList();
    try (var a = nodes.get(0); var b = nodes.get(1); var c = nodes.get(2)) {

        ELECTION election = a.getCh().stack().findProtocol(ELECTION.class);
        election.getSendLeader().block(); // block the election thread before apply the elected leader

        a.getCh().connect(clusterName); a.untilCoord(3); // A become the coordinator
        b.getCh().connect(clusterName);
        c.getCh().connect(clusterName); // majority reached, start a voting thread
        election.getSendLeader().untilWaiters(1, 3); // election thread has being blocked

        c.getCh().disconnect(); // C left
        a.untilMembers(2, 3); // view changed in A
        b.untilMembers(2, 3); // view changed in B

        election.getSendLeader().unblock(); // unblock the election thread

        // before https://github.com/jgroups-extras/jgroups-raft/pull/284
//      assertEquals("A", a.untilLeader(3)); // TODO A become the leader without majority reached
//      assertThrows(TimeoutException.class, () -> a.setAsync("cmd1".getBytes()).get(3, SECONDS));

        election.untilVotingThreadStop(3);
        assertNull(a.leader());
        assertThrows(RaftLeaderException.class, () -> a.setAsync("cmd1".getBytes()).get()).printStackTrace();

        b.untilLeader("A", 3); // TODO got electedLeader from A after view change event, miss the chance to remove the leader
        assertThrows(ExecutionException.class, () -> b.setAsync("cmd1".getBytes()).get()).printStackTrace();
    }
}