Enhance(compaction): Optimize the compaction task based on the state of the LSM

Hummock has an almost fixed “Rule” for picking compaction tasks, and only violates the “Rule” in extreme cases (write-stop), which is reliable in most cases.

However, it is not always efficient. Hummock's Compaction Rule is write-friendly, for example

Batch according to configuration
minimize write amplification
Appropriate compaction task size and target_file size to improve parallelism.

In extreme cases, the above rules are not efficient, such as

Read-sensitive workloads (more timely l0 compact)
large L0 stacks due to huge ckpt (more aggressive l0 batch)
space-sensitive (more timely high level compact)

In fact, the above scenarios can be divided into read and write, but the problem of read operation is affected by more factors, such as operator cache, block cache, file cache and cache refill/evict policy. Therefore, I would like to focus on optimizing the write problem first.

For the write scenario, we can optimize the known case by selecting the task through a Rule. 1.

Optimize the task's output sst parition by Write throughput (already implemented).
adjust the selection rule and output of the L0 task by the number of L0 stacks.
- More aggressive batch parameters, select more level count / sst count / max compaction size.
- Adjust the output sst size to reduce the sst count, and reduce the pressure on the meta.
optimize trivial-move commits, commit more trivial-move tasks in one commit.

risingwavelabs / risingwave

Enhance(compaction): Optimize the compaction task based on the state of the LSM #19531