A server will take a snapshot every snapshot_count new entries. The actual value will be randomized to avoid snapshots being taken at the same time for all servers.
In detail, when applying, we will check log.last_applied. If it reaches some value, we will send a snapshot task to cmd executor.
/// Apply new logs
fn apply(&self, log: &mut Log<C>) {
for i in (log.last_applied + 1)..=log.commit_index {
for cmd in log.entries[i].cmds() {
self.ctx
.cmd_tx
.send_after_sync(Arc::clone(cmd), i.numeric_cast());
}
log.last_applied = i;
// check here
if log.last_applied - log.snapshot.last_index > cfg.snapshot_count {
self.ctx.cmd_tx.send_snapshot(log.last_applied);
}
}
}
The snapshot request will conflict every other execution task.
After it is finished, the snapshot will be updated in CurpLog.
When to compact log entries?
Compaction will remove log entries are no longer needed.
It wil take place after a snapshot is taken. And in case of lagging followers, we will reserve some entries before the snapshot index(last log index before snapshot is taken).
In addition to remove entries in memory, we will remove these entires in rocksdb.
How to send snapshot to followers?
An exceptionally slow follower or a new server joining the cluster will need to download the snapshot from the leader.
I think there is no need for additional modification to the other handler logic when receiving snapshot since the follower will reject these new append_entries.
How will this feature be delivered?
A PR about snapshot only: each server will be able to take snapshot independently. They will also be able to recover from the snapshot.
A PR about compaction: each server will be able to compact their log entires after a snapshot is taken. And the follower will be able to send snapshot to follower in need.
Code of Conduct
[X] I agree to follow this project's Code of Conduct
Description about the feature
When to take a snapshot?
A snapshot will record current state machine .
A server will take a snapshot every
snapshot_count
new entries. The actual value will be randomized to avoid snapshots being taken at the same time for all servers.In detail, when applying, we will check
log.last_applied
. If it reaches some value, we will send a snapshot task to cmd executor.The
snapshot
request will conflict every other execution task.Cmd Executor api will be like:
After it is finished, the snapshot will be updated in
CurpLog
.When to compact log entries?
Compaction will remove log entries are no longer needed.
It wil take place after a snapshot is taken. And in case of lagging followers, we will reserve some entries before the snapshot index(last log index before snapshot is taken).
In addition to remove entries in memory, we will remove these entires in rocksdb.
How to send snapshot to followers?
An exceptionally slow follower or a new server joining the cluster will need to download the snapshot from the leader.
In detial, the leader will send its snapshot to a follower when it finds that log entires the follower need are already in snapshot. So, in calibration task, the leader will use gprc stream([tonic/routeguide-tutorial.md at master · hyperium/tonic (github.com)](https://github.com/hyperium/tonic/blob/master/examples/routeguide-tutorial.md#client-side-streaming-rpc)) to send snapshot to the follower. After a snapshot is received, it will be installed to the command executor. Then, the snapshot will be registered in the follower's log.
I think there is no need for additional modification to the other handler logic when receiving snapshot since the follower will reject these new
append_entries
.How will this feature be delivered?
Code of Conduct