ktoso / akka-raft

A toy project implementing RAFT on top of Akka Cluster (not prod ready)
http://blog.project13.pl
Apache License 2.0
280 stars 42 forks source link

Candidate should have election restart timeout #31

Open schuster opened 9 years ago

schuster commented 9 years ago

In the unlikely event that no candidate receives the majority vote for a given term, the candidates should timeout and start a new election. Currently there is no timeout, so eventually all servers in the cluster would transition to the Candidate state for the same term and get stuck.

schuster commented 9 years ago

Relatedly, beginElection sets a timeout, but Candidate does not listen for StateTimeout messages, so the timeout would never be caught (if I'm reading things correctly). Even if that is fixed, the timeout would still need to be reset on a Candidate->Candidate transition in case multiple elections fail.

colin-scott commented 9 years ago

I think Candidate did previously catch timeouts correctly [here], but you're right that it did not reset them properly on the Candidate -> Candidate transition after multiple elections fail.

For what it's worth, I have a (non-pull-request-worthy) fix for this issue here:

https://github.com/NetSys/sts2-applications/commit/55e6077e6c9294b797e597c6701ff2f0e8907f6c