Open schuster opened 9 years ago
I have an execution trace that triggers this behavior:
In a four node cluster, the leader is raft-member-3. Right before receiving an AppendSuccessful from raft-member-2 for the 1st log entry [and before receiving AppendSuccessful messages from any other members], the leader's state is:
BEFORE RECEIVE, LOG: ReplicatedLog(List(Entry(AppendWord(WORD1),Term(1),1,Some(Actor[akka://new-system-0/deadLetters]))),0,5)
BEFORE RECEIVE, STATE: LeaderMeta(Actor[akka://new-system-0/user/raft-member-3#-2011805761],Term(1),StableRaftConfiguration(Set(raft-member-1, raft-member-2, raft-member-3, raft-member-4)))
BEFORE RECEIVE, nextIndex: LogIndexMap(Map(Actor[akka://new-system-0/user/raft-member-1#-1436375817] -> 2, Actor[akka://new-system-0/user/raft-member-2#1657544712] -> 2, Actor[akka://new-system-0/user/raft-member-3#-2011805761] -> 2, Actor[akka://new-system-0/user/raft-member-4#1901315846] -> 2),2)
BEFORE RECEIVE, matchIndex: LogIndexMap(Map(Actor[akka://new-system-0/user/raft-member-1#-1436375817] -> 0, Actor[akka://new-system-0/user/raft-member-2#1657544712] -> 0, Actor[akka://new-system-0/user/raft-member-3#-2011805761] -> 1, Actor[akka://new-system-0/user/raft-member-4#1901315846] -> 0),0)
Upon receiving the message:
[INFO] [08/08/2015 01:18:38.792] [new-system-0-akka.actor.default-dispatcher-3] [akka://new-system-0/user/raft-member-3] Follower Actor[akka://new-system-0/user/raft-member-2#1657544712] took write in term: Term(1), next index was: 2
AFTER RECEIVE, LOG: ReplicatedLog(List(Entry(AppendWord(WORD1),Term(1),1,Some(Actor[akka://new-system-0/deadLetters]))),1,5)
**[INFO] [08/08/2015 01:18:38.797] [new-system-0-akka.actor.default-dispatcher-3] [akka://new-system-0/user/raft-member-3] Consensus for persisted index: 1. (Comitted index: 0, will commit now: true)**
AFTER RECEIVE, STATE: LeaderMeta(Actor[akka://new-system-0/user/raft-member-3#-2011805761],Term(1),StableRaftConfiguration(Set(raft-member-1, raft-member-2, raft-member-3, raft-member-4)))
AFTER RECEIVE, nextIndex: LogIndexMap(Map(Actor[akka://new-system-0/user/raft-member-1#-1436375817] -> 2, Actor[akka://new-system-0/user/raft-member-2#1657544712] -> 2, Actor[akka://new-system-0/user/raft-member-3#-2011805761] -> 2, Actor[akka://new-system-0/user/raft-member-4#1901315846] -> 2),2)
AFTER RECEIVE, matchIndex: LogIndexMap(Map(Actor[akka://new-system-0/user/raft-member-1#-1436375817] -> 0, Actor[akka://new-system-0/user/raft-member-2#1657544712] -> 1, Actor[akka://new-system-0/user/raft-member-3#-2011805761] -> 1, Actor[akka://new-system-0/user/raft-member-4#1901315846] -> 0),0)
[It should not have reached consensus at that point]
Note that this console output is using 1-indexed logs
For what it's worth, I have a fix for this (soon to be pull-requested) here:
https://github.com/NetSys/sts2-applications/commit/60c330f1ed958b8ae4350bc85698878899c20a57
[Assumes 1-indexed logs]
Great :) I don't mind switching to 1-indexed along the way, makes things more consistent with the paper => easier to follow the code.
Instead of calculating the newest index that a majority of the servers have in their logs,
indexOnMajority
appears to calculate something like the "mode" of matching indices: it returns the index that the largest number of members have as their match index.For example, say that a cluster of five servers have match indices 1, 1, 2, 3, and 4. The majority of servers have at least entry 2 in their logs, but this function will return 1 because more servers have 1 as their match index.
I think this algorithm (pseudocode) should work instead: sortedIndices = sortDescending(matchIndices) return sortedIndices[ceiling(length(config.members) / 2) - 1]