ktoso / akka-raft

A toy project implementing RAFT on top of Akka Cluster (not prod ready)
http://blog.project13.pl
Apache License 2.0
280 stars 42 forks source link

Two Leaders Elected in the Same Term #47

Open colin-scott opened 9 years ago

colin-scott commented 9 years ago

Hello,

I have a test case that, as far as I can tell, causes akka-raft to violate Raft's "Election Safety" property (see Figure 3 from the paper), i.e. it appears that two leaders are elected for the same term.

The test case consists of the following external events:

Upon running the test, I see the following in the console output:

[INFO] [04/24/2015 23:52:24.490] [new-system-0-akka.actor.default-dispatcher-5] [akka://new-system-0/user/raft-member-9] Initializing election (among 9 nodes) for Term(2)
[INFO] [04/24/2015 23:52:24.492] [new-system-0-akka.actor.default-dispatcher-7] [akka://new-system-0/user/raft-member-7] Initializing election (among 9 nodes) for Term(2)
...
[INFO] [04/24/2015 23:52:24.497] [new-system-0-akka.actor.default-dispatcher-5] [akka://new-system-0/user/raft-member-7] Received vote by Actor[akka://new-system-0/user/raft-member-4#-2022599620]; Won election with 5 of 9 votes
[INFO] [04/24/2015 23:52:24.496] [new-system-0-akka.actor.default-dispatcher-5] [akka://new-system-0/user/raft-member-9] Received vote by Actor[akka://new-system-0/user/raft-member-2#-1430451575]; Won election with 5 of 9 votes

For what it's worth, rather than inspecting the console output to detect this bug, we took a distributed snapshot of all RaftActor's states and found that in the same snapshot raft-member-9 is in state LeaderMeta(Actor[akka://new-system-0/user/raft-member-9],Term(2)) while raft-member-7 is in state LeaderMeta(Actor[akka://new-system-0/user/raft-member-7],Term(2)).

In our failing execution, we have the akka runtime deliver 27 total messages, including the 8 ChangeConfiguration messages. The delivery order is as follows (format is sender,receiver,message):

null,raft-member-7,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-4,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-2,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-1,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-9,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
null,raft-member-6,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
raft-member-9,raft-member-9,BeginElection
null,raft-member-3,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4))
raft-member-9,raft-member-1,(RequestVote,Term(1),raft-member-9,Term(0),0)
raft-member-9,raft-member-2,(RequestVote,Term(1),raft-member-9,Term(0),0)
raft-member-9,raft-member-6,(RequestVote,Term(1),raft-member-9,Term(0),0)
raft-member-1,raft-member-9,VoteCandidate(Term(0))
raft-member-6,raft-member-9,VoteCandidate(Term(0))
null,raft-member-5,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-1, raft-member-7, raft-member-2, raft-member-6, raft-member-5, raft-member-9, raft-member-8, raft-member-3, raft-member-4)))
raft-member-9,raft-member-9,BeginElection
raft-member-9,raft-member-3,(RequestVote,Term(2),raft-member-9,Term(0),0)
raft-member-7,raft-member-7,BeginElection
raft-member-7,raft-member-1,(RequestVote,Term(2),raft-member-7,Term(0),0)
raft-member-1,raft-member-1,BeginElection
raft-member-7,raft-member-5,(RequestVote,Term(2),raft-member-7,Term(0),0)
raft-member-7,raft-member-4,(RequestVote,Term(2),raft-member-7,Term(0),0)
raft-member-3,raft-member-9,VoteCandidate(Term(2))
raft-member-7,raft-member-7,BeginElection
raft-member-1,raft-member-7,VoteCandidate(Term(2))
raft-member-5,raft-member-7,VoteCandidate(Term(1))
raft-member-2,raft-member-9,VoteCandidate(Term(0))
raft-member-4,raft-member-7,VoteCandidate(Term(1))

Based on that delivery order, it appears that raft-member-9 receives votes from raft-member-{1,6,3,2,9}, and raft-member-7 receives votes from raft-member-{1,5,4,7}. A few things are strange about this: the votes received by raft-member-9 appear to be from different Terms; and raft-member-7 does not actually receive a quorum of votes (regardless of Term). I'm not exactly sure what the root cause is here.

We made akka's message scheduler deterministic so that you can easily reproduce the bug for yourself. Steps to reproduce:

$ git clone -b raft-leader-safety git@github.com:NetSys/sts2-applications.git
$ cd sts2-applications
$ git remote add interposition git@github.com:NetSys/sts2-interposition.git
$ git subtree pull --prefix=interposition interposition master
$ git clone git@github.com:NetSys/sts2-experiments.git experiments
$ sbt assembly
$ java -d64 -Xmx15g -cp target/scala-2.11/randomSearch-assembly-0.1.jar Main 2>&1 | tee console.out

From there you should be able to add logging statements and continue replaying as many times as needed.

We made a few small changes to akka-raft to generate this test case:

Let me know if you have any questions about how we ran this test.

Thanks! -Colin

ktoso commented 9 years ago

Excellent work Colin, looks really good. I need to find the time to go over your discovery!

colin-scott commented 9 years ago

I was able to minimize this test case further, by decreasing the cluster size.

Repro steps for smaller case:

$ git clone -b raft-shrunk-leader-safety git@github.com:NetSys/sts2-applications.git
$ cd sts2-applications
$ git clone git@github.com:NetSys/sts2-experiments.git experiments
$ sbt assembly
$ java -d64 -Xmx15g -cp target/scala-2.11/randomSearch-assembly-0.1.jar Main 2>&1 | tee console.out

Console output.

Messages delivered:

MsgEvent(deadLetters,raft-member-8,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-3, raft-member-6, raft-member-7, raft-member-8))))
MsgEvent(deadLetters,raft-member-7,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-3, raft-member-6, raft-member-7, raft-member-8))))
MsgEvent(deadLetters,raft-member-3,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-3, raft-member-6, raft-member-7, raft-member-8))))
MsgEvent(raft-member-8,raft-member-8,BasicFingerprint(BeginElection))
MsgEvent(raft-member-7,raft-member-7,BasicFingerprint(BeginElection))
MsgEvent(raft-member-8,raft-member-3,BasicFingerprint((RequestVote,Term(1),raft-member-8,Term(0),0)))
MsgEvent(raft-member-7,raft-member-7,BasicFingerprint(BeginElection))
MsgEvent(raft-member-8,raft-member-8,BasicFingerprint(BeginElection))
MsgEvent(deadLetters,raft-member-6,ChangeConfiguration(StableRaftConfiguration(Set(raft-member-3, raft-member-6, raft-member-7, raft-member-8))))
MsgEvent(raft-member-7,raft-member-3,BasicFingerprint((RequestVote,Term(2),raft-member-7,Term(0),0)))
MsgEvent(raft-member-3,raft-member-8,BasicFingerprint(VoteCandidate(Term(1))))
MsgEvent(raft-member-3,raft-member-3,BasicFingerprint(BeginElection))
MsgEvent(raft-member-8,raft-member-6,BasicFingerprint((RequestVote,Term(1),raft-member-8,Term(0),0)))
MsgEvent(raft-member-6,raft-member-8,BasicFingerprint(VoteCandidate(Term(0))))
MsgEvent(raft-member-3,raft-member-7,BasicFingerprint(VoteCandidate(Term(2))))
MsgEvent(raft-member-7,raft-member-8,BasicFingerprint((RequestVote,Term(2),raft-member-7,Term(0),0)))

At the end of this, raft-member-7 and raft-member-8 are both elected as leader in the same term.

dmitraver commented 9 years ago

@colin-scott Hi! I tried to run the test case that you provided but it appears that the code is changed and after running

git subtree pull --prefix=interposition interposition master

I get a lot of merge conflicts and I'm unable to run the test. I want to try fixing this issue and probably others too as I'm planning to use this library in my own project.

colin-scott commented 9 years ago

Hey @dmitraver,

Thanks for pointing out the issue. I corrected the repro steps above; if you start the repro steps from scratch it should work.

(The raft-shrunk-leader-safety branch already has the correct dependencies, so there shouldn't be a need to pull in the subtree and merge)

Thanks! -Colin

colin-scott commented 9 years ago

@dmitraver for what it's worth, I think the root cause of this bug is described in this issue:

https://github.com/ktoso/akka-raft/issues/46

dmitraver commented 9 years ago

@colin-scott Thx Colin! I will look into that.

ktoso commented 9 years ago

Thanks guys! Still wasn't able to resume work on it, a lot of stuff on Akka itself needs shipping soon so focused my efforts there :)

One note:

I'm planning to use this library in my own project.

Please don't if it's any real-life / production project – this implementation has been proven to not yet be correct so it would be a bad idea to use in a real project. Safety first!

dmitraver commented 9 years ago

@ktoso Akka is awesome tool! I just want to contribute to this project :) I've seen a lot of issues was opened previously so will try to fix as much as I can and make a PR. This is not intended to be used in production(maybe later) but more for my own akka studies.

ktoso commented 9 years ago

Awesome! For learning things on distributed systems and Akka + fixing things this project is ideal I think :-) Looking forward to hear from you then – feel free to open/ask on issues if you have any trouble,

dmitraver commented 9 years ago

@ktoso Thanks for support :)

colin-scott commented 9 years ago

Calling this a "test case" is a bit of a misnomer. Think of it as a replayable execution. It replays by trying to deliver the same (equivalent) messages it observed in the original execution. If some of those messages become missing, e.g. after you modified the akka-raft code, it fails to replay properly. Note though that that doesn't necessarily mean that you fixed the bug.

I can show you how I ran the initial fuzz test if that would be easier. Or I can show you how to run the replayer in a mode that doesn't crash whenever it diverges.

On Thu, Jul 23, 2015 at 5:46 AM Dmitry Avershin notifications@github.com wrote:

@colin-scott https://github.com/colin-scott Hi Colin! I did some research recently and found that candidate doesn't recreate ElectionMeta when it receives a BeginElection message and therefore at the beginning of new election it has some votes that were received previously(possibly in the other terms too). I tried to change the following line https://github.com/ktoso/akka-raft/blob/master/src/main/scala/pl/project13/scala/akka/raft/Candidate.scala#L31 to

m.forNewElection.incVote

which should solve the issue but then your test starts failing with

Exception in thread "main" akka.dispatch.verification.ReplayException: Expected event (raft-member-9,raft-member-3,BasicFingerprint((RequestVote,Term(2),raft-member-9,Term(0),0))) at akka.dispatch.verification.ReplayScheduler.replay(ReplayScheduler.scala:124) at akka.dispatch.verification.RunnerUtils$.replayExperiment(RunnerUtils.scala:139)

I tried to find what causes this issue but stuck. Could you give me some insight what is going wrong?

— Reply to this email directly or view it on GitHub https://github.com/ktoso/akka-raft/issues/47#issuecomment-124083867.

dmitraver commented 9 years ago

I think I fixed that issue, now your test shows no errors and passes successfully. The issue was with stale term number in VoteCandidate message. I will make a PR soon. Thanks for your support. Your tests were really helpful :)

2015-07-23 20:35 GMT+02:00 Colin Scott notifications@github.com:

Calling this a "test case" is a bit of a misnomer. Think of it as a replayable execution. It replays by trying to deliver the same (equivalent) messages it observed in the original execution. If some of those messages become missing, e.g. after you modified the akka-raft code, it fails to replay properly. Note though that that doesn't necessarily mean that you fixed the bug.

I can show you how I ran the initial fuzz test if that would be easier.

On Thu, Jul 23, 2015 at 5:46 AM Dmitry Avershin notifications@github.com wrote:

@colin-scott https://github.com/colin-scott Hi Colin! I did some research recently and found that candidate doesn't recreate ElectionMeta when it receives a BeginElection message and therefore at the beginning of new election it has some votes that were received previously(possibly in the other terms too). I tried to change the following line < https://github.com/ktoso/akka-raft/blob/master/src/main/scala/pl/project13/scala/akka/raft/Candidate.scala#L31

to

m.forNewElection.incVote

which should solve the issue but then your test starts failing with

Exception in thread "main" akka.dispatch.verification.ReplayException: Expected event (raft-member-9,raft-member-3,BasicFingerprint((RequestVote,Term(2),raft-member-9,Term(0),0))) at akka.dispatch.verification.ReplayScheduler.replay(ReplayScheduler.scala:124) at akka.dispatch.verification.RunnerUtils$.replayExperiment(RunnerUtils.scala:139)

I tried to find what causes this issue but stuck. Could you give me some insight what is going wrong?

— Reply to this email directly or view it on GitHub https://github.com/ktoso/akka-raft/issues/47#issuecomment-124083867.

— Reply to this email directly or view it on GitHub https://github.com/ktoso/akka-raft/issues/47#issuecomment-124202917.

Best wishes, Dmitry Avershin.