perf(compaction): Consider future write amplification in compaction

In hummock, compaction can be categorized into two basic types:

trivial task: apply on meta node
normal task: meta and compactor collaboration

The compaction task picks the inputs by the picker algorithm and produces the outputs by meta/compactor operations depending on the type. However, the picker algorithm only picks the ssts based on level and level + 1 (size/count/amplification).

trivial task: Does not limit the write-amplification of the task, since only single-sst is included.
normal task: Reduce write amplification by selecting min-overlap ssts.

Although we try to work out the task with less write amplification by the current input level (Ln, Ln+1), we ignore the effect of sst reaching the output level (future write amplification). For example:

trivial-task L1 sst[5,6] -> L2 may cause larger write-amplification task (from L2 -> L3)

L1 [5, 6]

l2            [7,8]

l3 [1, 5], [6, 7]

normal-task: L1[4, 6] -> L2 may cause larger write-amplification task (from L2 -> L3)
- task 1 will produce sst L2[4, 6, 17, 19] If sst [8, 9] contains a large amount of data in L3, the next compact task will cause larger write amplification
- If sst is reorganized into L2 [4, 6] [17, 19], the next compact task will not contain [8, 9] to alleviate write amplification.

L1 [4, 17]

l2      [6, 19]

l3 [1, 5], [8, 9], [17, 19]

Fortunately, due to the characteristics of level compaction, non-L0 sst will only move to the Ln + 1 layer. Therefore, we can judge the impact of SST on future compact task write amplification based on the data distribution of the Ln + 2. And use these characteristics to organize sst and reduce future write amplification.

trivial-task: Disable trivial-move when it would result in a larger write amplification task
normal-task: Based on the current rules, further organize sst according to the information of Ln+2 thereby reducing future write amplification

risingwavelabs / risingwave

perf(compaction): Consider future write amplification in compaction #14242