uwiger / locks

A scalable, deadlock-resolving resource locker
Mozilla Public License 2.0
204 stars 26 forks source link

how the leader election compares to raft? #5

Closed benoitc closed 9 years ago

benoitc commented 10 years ago

I am curious how the implementation compares with raft? Did you have a look on it?

Also can we add dynamically a member and remove it from the cluster? How many participants can it handle?

uwiger commented 9 years ago

Sorry, I missed this one.

The Raft leader election protocol is timeout-based, and the paper recommends using conservative timeout values of ca 150-300 ms. In their own tests, this leads to leader election times of 100 - 300 ms. The locks_leader algorithm is event-triggered, and converges much more quickly: when I've tested 5-node scenarios on one machine, convergence time seems to be around 10-20 ms*.

The Raft replication mechanism should be possible to implement on top of locks_leader. A possibility then would be to make use of the {surrender, Other, ModSt} reply if the leader elected by locks decides that another candidate would be a better leader (e.g. a higher "term" after a netsplit). This surrender operation uses the basic lock surrender supported by locks, and basically means that the leader hands its locks to another candidate, making it the leader.

Members can be dynamically added and removed. Note, however, that if you use Raft replication on top of locks_leader, Raft has some restrictions on how membership changes are made (called "joint consensus"). In the case of locks_leader electing the leader, there won't ever be two leaders unless there's a netsplit, but specifically for the netsplit case, the leaders need to know what constitutes a majority. Therefore, the Raft callback would still need to commit the membership changes as recommended by the paper.

I haven't run any max configuration tests with locks, but it is designed to be conservative in the number of messages sent. Each candidate tries to lock every involved node, and especially after netsplits, there will be lock surrender actions...

In a unit test with 5 nodes and a netsplit, the two islands (2 and 3 nodes respectively) had converged after 13-14 ms. After joining the networks, all nodes had converged ca 18 ms after the last nodeup arrived (which in itself took 42 ms after the first nodeup). Some 56 surrender actions were performed. ;-)

benoitc commented 9 years ago

Sorry I missed your answer. It's really informative and actually give me some inspiration. Thanks!