jamestran201 / mit-distributed-systems-labs

Implementation of the labs for MIT distributed systems course
0 stars 0 forks source link

Lab 2: Raft #2

Open jamestran201 opened 1 year ago

jamestran201 commented 1 year ago

In this lab you'll implement Raft as a Go object type with associated methods, meant to be used as a module in a larger service. A set of Raft instances talk to each other with RPC to maintain replicated logs. Your Raft interface will support an indefinite sequence of numbered commands, also called log entries. The entries are numbered with index numbers. The log entry with a given index will eventually be committed. At that point, your Raft should send the log entry to the larger service for it to execute.

Follow the design in https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf, pay particular attention to Figure 2. Will not implement "cluster membership changes" (chapter 6).

Resources:

jamestran201 commented 1 year ago

Part 2A

Implement Raft leader election and heartbeats (AppendEntries RPCs with no log entries). The goal for Part 2A is for a single leader to be elected, for the leader to remain the leader if there are no failures, and for a new leader to take over if the old leader fails or if packets to/from the old leader are lost

Leader election

Based on the rules for candidates in fig 2, a candidate should send vote requests and reset the election time out in parallel

After getting elected, leader should send heartbeat messages immediately

Scenarios

Suppose that there are 5 servers in the cluster, all of them start out as follower

4 followers, 1 candidate

The candidate obtains the majority vote and becomes the leader. The new leader then sends heartbeat messages to the followers.

3 followers, 2 candidates

1 candidate beats the other to become the leader

Happy path: The leader sends heartbeat messages to all servers, and they all receive the message. The other candidate will become a follower after returning the heartbeat message.

Heartbeat from leader fails to reach the other candidate: When the candidate's election timeout period elapses, it will start the election process again. Let's say that the term that each server has looks like:

When the candidate starts the election process again, its term will be come 2 and sends out vote requests. Before all servers reply with the vote response, the candidate receives a heartbeat message from the current leader. The candidate's term is now 2, while the leader's term is 1. What should happen in this case?

Also, how should other servers (not candidate) process the vote requests?

3 followers, 1 candidate, 1 leader

AppendEntries

How do we define the idle period where heartbeat messages should be sent?

jamestran201 commented 1 year ago

Part 2B

Read sections 5.3, 5.4 and 5.5 of the paper.

When leader receives request from client

  1. appends the command to its log as a new entry
  2. issues AppendEntries RPCs in parallel to each of the other servers to replicate the entry
  3. When the entry has been safely replicated, the leader applies the entry to its state machine and returns the result of that execution to the client
  4. If followers crash or run slowly, or if network packets are lost, the leader retries AppendEntries RPCs indefinitely (even after it has responded to the client) until all followers eventually store all log entries
  5. A log entry is committed once the leader that created the entry has replicated it on a majority of the servers
    • This also commits all preceding entries in the leader’s log, including entries created by previous leaders
    • Once a follower learns that a log entry is committed, it applies the entry to its local state machine (in log order)

Question

Say we have the following scenario:

  1. The leader is retrying an AppendEntry request to a follower
  2. Client sends new command, so leader sends new AppendEntry request to followers
  3. This new AppendEntry request is successful on all followers. This means that the log entry that the leader was trying to send in step 1 has been committed on all followers

What should happen to the request in step 1 now? Should it stop? Should the follower no-op while handling it?

Changes to AppendEntries

The leader keeps track of the highest index it knows to be committed, and it includes that index in future AppendEntries RPCs (including heartbeats) so that the other servers eventually find out

When sending an AppendEntries RPC, the leader includes the index and term of the entry in its log that immediately precedes the new entries. If the follower does not find an entry in its log with the same index and term, then it refuses the new entries.

When handling AppendEntries

prevLogIndex: index of log entry immediately before the new entries prevLogTerm: term of the log at prevLogIndex on the leader/candidate

log indices start at 1

Return false if the server's log at prevLogIndex does not have the same term as prevLogTerm.

What are the possible scenarios?

When the very first leader gets elected

prevLogIndex: 0 - the very first log would have index 1, so prevLogIndex should be 0 since there's no logs yet prevLogTerm: 0 - the term starts off at 0 on boot, and no log should have been added yet

If the server doesn't have any logs yet, then continue processing.

If the server has any logs, return false.

When a leader is elected halfway through

prevLogIndex: n prevLogTerm: t

If prevLogIndex < len(server's logs), the current server must have a log at that index. Compare prevLogTerm with server's logs[prevLogIndex-1] (prevLogIndex-1 because indices start from 1).

If prevLogIndex == len(server's logs), same as above.

If prevLogIndex > len(server's logs), server cannot contain any log at prevLogIndex, so return false.

Handling inconsistencies

However, leader crashes can leave the logs inconsistent (the old leader may not have fully replicated all of the entries in its log

In Raft, the leader handles inconsistencies by forcing the followers’ logs to duplicate its own. This means that conflicting entries in follower logs will be overwritten with entries from the leader’s log

To bring a follower’s log into consistency with its own, the leader must:

Leader completeness property

The RequestVote RPC implements this restriction: the RPC includes information about the candidate’s log, and the voter denies its vote if its own log is more up-to-date than that of the candidate.

Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs. If the logs have last entries with different terms, then the log with the later term is more up-to-date. If the logs end with the same term, then whichever log is longer is more up-to-date.

Committing entries from previous terms

If a leader crashes before committing an entry, future leaders will attempt to finish replicating the entry.

A leader cannot immediately conclude that an entry from a previous term is committed once it is stored on a majority of servers.

Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas; once an entry from the current term has been committed in this way, then all prior entries are committed indirectly because of the Log Matching Property.

jamestran201 commented 1 year ago

Part 2D

Your Raft must provide the following function that the service can call with a serialized snapshot of its state:

Snapshot(index int, snapshot []byte)

In Lab 2D, the tester calls Snapshot() periodically. In Lab 3, you will write a key/value server that calls Snapshot(); the snapshot will contain the complete table of key/value pairs. The service layer calls Snapshot() on every peer (not just on the leader).

The index argument indicates the highest log entry that's reflected in the snapshot. Raft should discard its log entries before that point. You'll need to revise your Raft code to operate while storing only the tail of the log.

You'll need to implement the InstallSnapshot RPC discussed in the paper that allows a Raft leader to tell a lagging Raft peer to replace its state with a snapshot. You will likely need to think through how InstallSnapshot should interact with the state and rules in Figure 2.

When a follower's Raft code receives an InstallSnapshot RPC, it can use the applyCh to send the snapshot to the service in an ApplyMsg. The ApplyMsg struct definition already contains the fields you will need (and which the tester expects). Take care that these snapshots only advance the service's state, and don't cause it to move backwards.

If a server crashes, it must restart from persisted data. Your Raft should persist both Raft state and the corresponding snapshot. Use the second argument to persister.Save() to save the snapshot. If there's no snapshot, pass nil as the second argument.

When a server restarts, the application layer reads the persisted snapshot and restores its saved state.

A "snapshot" will be considered one of the persisted states of the server -> Need to include it when writing and reading states from disk.

For Snapshot(index int, snapshot []byte)

Think about snapshots changes Figure 2

For InstallSnapshot RPC