how the leader election compares to raft?

benoitc commented 10 years ago

I am curious how the implementation compares with raft? Did you have a look on it?

Also can we add dynamically a member and remove it from the cluster? How many participants can it handle?

uwiger commented 9 years ago

Sorry, I missed this one.

The Raft leader election protocol is timeout-based, and the paper recommends using conservative timeout values of ca 150-300 ms. In their own tests, this leads to leader election times of 100 - 300 ms. The locks_leader algorithm is event-triggered, and converges much more quickly: when I've tested 5-node scenarios on one machine, convergence time seems to be around 10-20 ms*.

The Raft replication mechanism should be possible to implement on top of locks_leader. A possibility then would be to make use of the {surrender, Other, ModSt} reply if the leader elected by locks decides that another candidate would be a better leader (e.g. a higher "term" after a netsplit). This surrender operation uses the basic lock surrender supported by locks, and basically means that the leader hands its locks to another candidate, making it the leader.

Granted, as locks makes use of Erlang monitors, a worst case is that the communication link to the leader dies, and the net_tick timer must release before the old leader is declared dead. However, the argument could be made that it's better to do it this way in Erlang, since this is consistent with other applications. It can be problematic if different applications use different strategies to determine the state of the network, and potentially come to different conclusions, or have significantly different response times to failures.

Members can be dynamically added and removed. Note, however, that if you use Raft replication on top of locks_leader, Raft has some restrictions on how membership changes are made (called "joint consensus"). In the case of locks_leader electing the leader, there won't ever be two leaders unless there's a netsplit, but specifically for the netsplit case, the leaders need to know what constitutes a majority. Therefore, the Raft callback would still need to commit the membership changes as recommended by the paper.

I haven't run any max configuration tests with locks, but it is designed to be conservative in the number of messages sent. Each candidate tries to lock every involved node, and especially after netsplits, there will be lock surrender actions...

In a unit test with 5 nodes and a netsplit, the two islands (2 and 3 nodes respectively) had converged after 13-14 ms. After joining the networks, all nodes had converged ca 18 ms after the last nodeup arrived (which in itself took 42 ms after the first nodeup). Some 56 surrender actions were performed. ;-)

benoitc commented 9 years ago

Sorry I missed your answer. It's really informative and actually give me some inspiration. Thanks!

uwiger / locks

how the leader election compares to raft? #5