xline-kv / Xline

A geo-distributed KV store for metadata management
https://xline.cloud
Apache License 2.0
595 stars 74 forks source link

[Feature]: Curp snapshot #190

Closed markcty closed 1 year ago

markcty commented 1 year ago

Description about the feature

When to take a snapshot?

A snapshot will record current state machine .

A server will take a snapshot every snapshot_count new entries. The actual value will be randomized to avoid snapshots being taken at the same time for all servers.

In detail, when applying, we will check log.last_applied. If it reaches some value, we will send a snapshot task to cmd executor.

  /// Apply new logs
  fn apply(&self, log: &mut Log<C>) {
      for i in (log.last_applied + 1)..=log.commit_index {
          for cmd in log.entries[i].cmds() {
              self.ctx
                  .cmd_tx
                  .send_after_sync(Arc::clone(cmd), i.numeric_cast());
          }
          log.last_applied = i;

            // check here
          if log.last_applied - log.snapshot.last_index > cfg.snapshot_count {
              self.ctx.cmd_tx.send_snapshot(log.last_applied);
          }
      }
  }

The snapshot request will conflict every other execution task.

Cmd Executor api will be like:

#[async_trait]
pub trait CommandExecutor<C>: Sync + Send + Clone + std::fmt::Debug
where
    C: Command,
{
    // ....

    /// Take a snapshot
    async fn snapshot(&self) -> Result<Arc<dyn SnapshotApi>, Self::Error>;

    /// Install a snapshot
    async fn install_snapshot(&self, snapshot: Arc<dyn SnapshotApi>) -> Result<(), Self::Error>;
}

After it is finished, the snapshot will be updated in CurpLog.

When to compact log entries?

Compaction will remove log entries are no longer needed.

It wil take place after a snapshot is taken. And in case of lagging followers, we will reserve some entries before the snapshot index(last log index before snapshot is taken).

In addition to remove entries in memory, we will remove these entires in rocksdb.

How to send snapshot to followers?

An exceptionally slow follower or a new server joining the cluster will need to download the snapshot from the leader.

In detial, the leader will send its snapshot to a follower when it finds that log entires the follower need are already in snapshot. So, in calibration task, the leader will use gprc stream([tonic/routeguide-tutorial.md at master · hyperium/tonic (github.com)](https://github.com/hyperium/tonic/blob/master/examples/routeguide-tutorial.md#client-side-streaming-rpc)) to send snapshot to the follower. After a snapshot is received, it will be installed to the command executor. Then, the snapshot will be registered in the follower's log.

I think there is no need for additional modification to the other handler logic when receiving snapshot since the follower will reject these new append_entries.

How will this feature be delivered?

  1. A PR about snapshot only: each server will be able to take snapshot independently. They will also be able to recover from the snapshot.
  2. A PR about compaction: each server will be able to compact their log entires after a snapshot is taken. And the follower will be able to send snapshot to follower in need.

Code of Conduct

markcty commented 1 year ago

Merged to #176