rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.
Other
798 stars 93 forks source link

Checkpoints #415

Closed the-mikedavis closed 4 months ago

the-mikedavis commented 5 months ago

As described in #141, we add a new effect that machines may emit:

{checkpoint, CheckpointIndex, MachineState}

This suggests that Ra should add a checkpoint. Checkpoints are essentially the same as snapshots except that they don't trigger log truncation. They reuse the ra_snapshot behavior, so they are exactly the same as snapshots on disk. They can later be promoted to actual snapshots by emitting another new effect:

{release_cursor, ReleaseCursorIndex}

which suggests that Ra should find the checkpoint with the highest index lower than or equal to ReleaseCursorIndex and rename the checkpoint file so that it moves to snapshots/ from checkpoints/. Any checkpoints lower than that index are then deleted, and log can be truncated up to that index. There is a configurable maximum number of checkpoints allowed. Adding more checkpoints after that maximum will trigger a "thinning out" where checkpoints are deleted randomly from the middle of the checkpoint list.

rabbit_fifo currently has an ad-hoc checkpointing system where it queues snapshot effects ({release_cursor, RaftIndex, DehydratedState}) regularly and emits those effects when it moves up the release cursor. The advantage of building this into Ra is that we can store the checkpoints on disk, so we can use them for machine recovery and reduce memory consumption (albeit at the cost of disk / IO).

This is useful for machines like rabbit_fifo that need to keep the log around on disk for a potentially long time (for use with the {log, Idxs, Fun, Opts} effect). If we take checkpoints regularly then recovery becomes constant-time rather than linear on the number of messages in the queue. Testing this locally on my machine against the server with a QQ containing 5 million messages[^1] gives a recovery time of 10ms while main takes around 24s.

Closes https://github.com/rabbitmq/ra/issues/141

[^1]: use the md-ra-checkpoints branch, fill the queue with perf-test -x 1 -y 0 -qq -u qq -c 3000 -C 5000000, restart the broker and grep the log file for "recovery of state machine version"