This PR adds an MVP for the compactor process in the form of a shell. It's not feature-complete but can perform compaction according to basic strategies. We'll expand strategies and functionality later on
It is exposed as a new cli bin:
cargo r --bin compactor -- --help
Compiling libsql-wal v0.1.0 (/Users/mpostma/Documents/code/turso/libsql/libsql-wal/libsql-wal)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.01s
Running `target/debug/compactor --help`
Usage: compactor [OPTIONS] <COMMAND>
Commands:
monitor Register namespaces to monitor
analyze Analyze segments for a namespaces
compact Compact segments into bigger segments
sync Sync namespace metadata from remote storage
restore Restore namespace
help Print this message or the help of the given subcommand(s)
Options:
-p, --path <PATH> [default: compactor]
--enable-s3
--cluster-id <CLUSTER_ID> [env: LIBSQL_BOTTOMLESS_DATABASE_ID=]
--s3-url <S3_URL> [env: LIBSQL_BOTTOMLESS_ENDPOINT=]
--s3-access-key <S3_ACCESS_KEY> [env: LIBSQL_BOTTOMLESS_AWS_SECRET_ACCESS_KEY=]
--s3-access-key-id <S3_ACCESS_KEY_ID> [env: LIBSQL_BOTTOMLESS_AWS_ACCESS_KEY_ID=]
--s3-bucket <S3_BUCKET> [env: LIBSQL_BOTTOMLESS_BUCKET=]
--s3-region-id <S3_REGION_ID> [env: LIBSQL_BOTTOMLESS_AWS_DEFAULT_REGION=]
-h, --help Print help
Usage
The compactor is designed to run as a separate process. Namespaces are registered with the compactor with the monitor command. Once a namespace is monitored, you can call sync to retrieve metadata about segments for this namespace stored on the remote storage. Once the compactor is synced, it can start to operate. All operations are performed on cached metadata.
Analyze
Analyze segments for a namespaces
Usage: compactor analyze [OPTIONS] <NAMESPACE>
Arguments:
<NAMESPACE>
Options:
--list-all list all segments
-h, --help Print help
The analyze command returns general information about the segments stored for that namespace. The --list-all also list all the segments.
e.g:
The shortest restore path len tells us how many segments (at most) we would need to fetch to restore a db to the most recent state.
Compact
Compact segments into bigger segments
Usage: compactor compact [OPTIONS] --strategy <STRATEGY> <NAMESPACE>
Arguments:
<NAMESPACE>
Options:
-s, --strategy <STRATEGY> [possible values: logarithmic, compact-all]
--dry-run prints the compaction plan, but doesn't perform it
-h, --help Print help
The compact subcommand will attempt to compact a set of segments according to a compaction strategy.
Compaction's first step is to find the smallest subset of segments that covers the whole frame space. This is the minimal restore path (MRP). The goal of compaction is to reduce the length of the MPR. Different strategies can be used to achieve this.
Currently, there are two supported strategies for compaction:
logarithmic: given a set of n segments, the strategy will generate log(n) segments of exponentially decreasing size
compact-all: compact all segments into single segments
The restore command will restore the database to the given path.
Design
The compactor uses an SQLite database to store metadata about monitored namespaces. The sync call calls the underlying backend to get a list of all segments and store them in the SQLite database.
The analyze call builds a graph of the segments. A shortest path algorithm can be run on this graph to find the MRP.
We introduce a SegmentSet structure that represents a continuous and non-overlapping set of segments.
The PartitionStrategy trait expresses how a SegmentSet can be partitioned into multiple SegmentSet.
The compact method will take a SegmentSet and compact them into a single segment and upload that segment using the configured storage Backend
This PR adds an MVP for the compactor process in the form of a shell. It's not feature-complete but can perform compaction according to basic strategies. We'll expand strategies and functionality later on
It is exposed as a new cli bin:
Usage
The compactor is designed to run as a separate process. Namespaces are registered with the compactor with the
monitor
command. Once a namespace is monitored, you can callsync
to retrieve metadata about segments for this namespace stored on the remote storage. Once the compactor is synced, it can start to operate. All operations are performed on cached metadata.Analyze
The
analyze
command returns general information about the segments stored for that namespace. The--list-all
also list all the segments. e.g:The
shortest restore path len
tells us how many segments (at most) we would need to fetch to restore a db to the most recent state.Compact
The compact subcommand will attempt to compact a set of segments according to a compaction strategy. Compaction's first step is to find the smallest subset of segments that covers the whole frame space. This is the minimal restore path (MRP). The goal of compaction is to reduce the length of the MPR. Different strategies can be used to achieve this. Currently, there are two supported strategies for compaction:
logarithmic
: given a set ofn
segments, the strategy will generate log(n) segments of exponentially decreasing sizecompact-all
: compact all segments into single segmentsRestore
The restore command will restore the database to the given path.
Design
The compactor uses an SQLite database to store metadata about monitored namespaces. The
sync
call calls the underlying backend to get a list of all segments and store them in the SQLite database.The analyze call builds a graph of the segments. A shortest path algorithm can be run on this graph to find the MRP.
We introduce a
SegmentSet
structure that represents a continuous and non-overlapping set of segments. ThePartitionStrategy
trait expresses how aSegmentSet
can be partitioned into multipleSegmentSet
.The
compact
method will take aSegmentSet
and compact them into a single segment and upload that segment using the configured storageBackend