Closed yfei-z closed 3 months ago
This PR is a similar idea I tried initially. If the view changes during the election, we have a hard stop. But it covers cases that lead to some errors on my side. Removing the stopVotingThread()
from the ELECTION seems to make it work :smiley:
I'll pull your commits in to keep your changes and apply small changes. I'll let the tests run overnight, and we'll see how it goes. Thanks for reporting and coming up with a solution!
I found a problem when I'm running another test.
GMS notify the VIEW_CHANGE to upper protocols before it multicast a VIEW message, in lost majority case, the stopVotingThread()
could block the event thread (100ms max) until the running voting thread to finish to prevent (ideally) the VIEW message is sent before the ElectedLeader message, so participant nodes could unset the false leader by the VIEW message at last.
Without the stopVotingThread()
, the VIEW message might being sent first, then the false leader will be kept in other nodes after the following ElectedLeader message, since the stopVotingThread()
can't ensure the voting thread is stopped, so I'm thinking about that send the ElectedLeader message with null leader and 0 term in voting thread.
I will amend the commit, please take a look.
@yfei-z To not push to your branch, I've created #286, if you're interested in trying that. It should fix the mentioned issues and seem stable in my local tests.
Just a simple thought, hope to give you some idea.