rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.
Other
805 stars 96 forks source link

Segment compaction #206

Open kjnilsson opened 3 years ago

kjnilsson commented 3 years ago

It is proving difficult to write certain state machines with the current requirement to provide fast and easy snapshots. E.g. ETS based state machines are nearly impossible due to the mutable nature of ETS. For this kind of state machine it would be better to periodically compact segment data so that disk space can be recovered without any snapshots being taken.

Compacting segments will require the state machine to track which raft indexes are still in use (contribute to the current state) and cannot be removed. (the ra server itself will also need to do similar tracking for noop and cluster change commands. Once Ra has a full set of the indexes required it can compact and replace any segments.

Most likely this will require some internal changes to handle non-contiguous indexes.

E.g in the log {1, put, key1, v1}, {2, put, key2, v1}, {3, put, key1, v2} can be compacted as {2, put, key2, v1}, {3, put, key1, v2}

kjnilsson commented 3 years ago

https://web.archive.org/web/20160304221113/http://atomix.io/copycat/docs/internals/

Some prior art - I would prefer to explore the approach where the state machine tracks the indexes it needs rather than incrementally emitting the indexes it doesn't need. This should avoid the need to use tombstones and multiple compaction passes (minor and major)

kjnilsson commented 2 years ago

log compaction would work but only if the operations in the log a primitive put, delete type operations. A CAS operation or any operation that depends on prior data means there is a chain of dependencies in order to build the correct state. For simple cases it may be possible to derive a CAS into a put and treat it as such but say an operation that copied the value from one key into a new key would be inherently dependent. the original operation can never be removed and if the original operation depends on another operation we will end up with a very long log anyway with no guarantees of compaction.