Open jamestran201 opened 1 year ago
Implement Raft leader election and heartbeats (AppendEntries RPCs with no log entries). The goal for Part 2A is for a single leader to be elected, for the leader to remain the leader if there are no failures, and for a new leader to take over if the old leader fails or if packets to/from the old leader are lost
Based on the rules for candidates in fig 2, a candidate should send vote requests and reset the election time out in parallel
After getting elected, leader should send heartbeat messages immediately
Suppose that there are 5 servers in the cluster, all of them start out as follower
The candidate obtains the majority vote and becomes the leader. The new leader then sends heartbeat messages to the followers.
1 candidate beats the other to become the leader
Happy path: The leader sends heartbeat messages to all servers, and they all receive the message. The other candidate will become a follower after returning the heartbeat message.
Heartbeat from leader fails to reach the other candidate: When the candidate's election timeout period elapses, it will start the election process again. Let's say that the term that each server has looks like:
When the candidate starts the election process again, its term will be come 2 and sends out vote requests. Before all servers reply with the vote response, the candidate receives a heartbeat message from the current leader. The candidate's term is now 2, while the leader's term is 1. What should happen in this case?
Also, how should other servers (not candidate) process the vote requests?
How do we define the idle period where heartbeat messages should be sent?
Read sections 5.3, 5.4 and 5.5 of the paper.
Say we have the following scenario:
What should happen to the request in step 1 now? Should it stop? Should the follower no-op while handling it?
The leader keeps track of the highest index it knows to be committed, and it includes that index in future AppendEntries RPCs (including heartbeats) so that the other servers eventually find out
When sending an AppendEntries RPC, the leader includes the index and term of the entry in its log that immediately precedes the new entries. If the follower does not find an entry in its log with the same index and term, then it refuses the new entries.
prevLogIndex
: index of log entry immediately before the new entries
prevLogTerm
: term of the log at prevLogIndex
on the leader/candidate
log indices start at 1
Return false if the server's log at prevLogIndex
does not have the same term as prevLogTerm
.
What are the possible scenarios?
When the very first leader gets elected
prevLogIndex
: 0 - the very first log would have index 1, so prevLogIndex
should be 0 since there's no logs yet
prevLogTerm
: 0 - the term starts off at 0 on boot, and no log should have been added yet
If the server doesn't have any logs yet, then continue processing.
If the server has any logs, return false.
When a leader is elected halfway through
prevLogIndex
: n
prevLogTerm
: t
If prevLogIndex
< len(server's logs), the current server must have a log at that index. Compare prevLogTerm
with server's logs[prevLogIndex-1]
(prevLogIndex-1 because indices start from 1).
If prevLogIndex
== len(server's logs), same as above.
If prevLogIndex
> len(server's logs), server cannot contain any log at prevLogIndex
, so return false.
However, leader crashes can leave the logs inconsistent (the old leader may not have fully replicated all of the entries in its log
In Raft, the leader handles inconsistencies by forcing the followers’ logs to duplicate its own. This means that conflicting entries in follower logs will be overwritten with entries from the leader’s log
To bring a follower’s log into consistency with its own, the leader must:
The RequestVote RPC implements this restriction: the RPC includes information about the candidate’s log, and the voter denies its vote if its own log is more up-to-date than that of the candidate.
Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs. If the logs have last entries with different terms, then the log with the later term is more up-to-date. If the logs end with the same term, then whichever log is longer is more up-to-date.
If a leader crashes before committing an entry, future leaders will attempt to finish replicating the entry.
A leader cannot immediately conclude that an entry from a previous term is committed once it is stored on a majority of servers.
Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas; once an entry from the current term has been committed in this way, then all prior entries are committed indirectly because of the Log Matching Property.
Your Raft must provide the following function that the service can call with a serialized snapshot of its state:
Snapshot(index int, snapshot []byte)
In Lab 2D, the tester calls Snapshot() periodically. In Lab 3, you will write a key/value server that calls Snapshot(); the snapshot will contain the complete table of key/value pairs. The service layer calls Snapshot() on every peer (not just on the leader).
The index argument indicates the highest log entry that's reflected in the snapshot. Raft should discard its log entries before that point. You'll need to revise your Raft code to operate while storing only the tail of the log.
You'll need to implement the InstallSnapshot RPC discussed in the paper that allows a Raft leader to tell a lagging Raft peer to replace its state with a snapshot. You will likely need to think through how InstallSnapshot should interact with the state and rules in Figure 2.
When a follower's Raft code receives an InstallSnapshot RPC, it can use the applyCh to send the snapshot to the service in an ApplyMsg. The ApplyMsg struct definition already contains the fields you will need (and which the tester expects). Take care that these snapshots only advance the service's state, and don't cause it to move backwards.
If a server crashes, it must restart from persisted data. Your Raft should persist both Raft state and the corresponding snapshot. Use the second argument to persister.Save() to save the snapshot. If there's no snapshot, pass nil as the second argument.
When a server restarts, the application layer reads the persisted snapshot and restores its saved state.
A "snapshot" will be considered one of the persisted states of the server -> Need to include it when writing and reading states from disk.
snapshot
sliceindex
)index
Follow the design in https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf, pay particular attention to Figure 2. Will not implement "cluster membership changes" (chapter 6).
Resources: