jgroups-extras / jgroups-raft

Implementation of the RAFT consensus protocol in JGroups
https://jgroups-extras.github.io/jgroups-raft/
Apache License 2.0
266 stars 84 forks source link

Trying to fix #280 #284

Closed yfei-z closed 3 months ago

yfei-z commented 5 months ago

Just a simple thought, hope to give you some idea.

jabolina commented 5 months ago

This PR is a similar idea I tried initially. If the view changes during the election, we have a hard stop. But it covers cases that lead to some errors on my side. Removing the stopVotingThread() from the ELECTION seems to make it work :smiley:

I'll pull your commits in to keep your changes and apply small changes. I'll let the tests run overnight, and we'll see how it goes. Thanks for reporting and coming up with a solution!

yfei-z commented 5 months ago

I found a problem when I'm running another test.

GMS notify the VIEW_CHANGE to upper protocols before it multicast a VIEW message, in lost majority case, the stopVotingThread() could block the event thread (100ms max) until the running voting thread to finish to prevent (ideally) the VIEW message is sent before the ElectedLeader message, so participant nodes could unset the false leader by the VIEW message at last.

Without the stopVotingThread(), the VIEW message might being sent first, then the false leader will be kept in other nodes after the following ElectedLeader message, since the stopVotingThread() can't ensure the voting thread is stopped, so I'm thinking about that send the ElectedLeader message with null leader and 0 term in voting thread.

I will amend the commit, please take a look.

jabolina commented 5 months ago

@yfei-z To not push to your branch, I've created #286, if you're interested in trying that. It should fix the mentioned issues and seem stable in my local tests.