risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.78k stars 561 forks source link

perf(compaction): Consider future write amplification in compaction #14242

Open Li0k opened 8 months ago

Li0k commented 8 months ago

In hummock, compaction can be categorized into two basic types:

The compaction task picks the inputs by the picker algorithm and produces the outputs by meta/compactor operations depending on the type. However, the picker algorithm only picks the ssts based on level and level + 1 (size/count/amplification).

Although we try to work out the task with less write amplification by the current input level (Ln, Ln+1), we ignore the effect of sst reaching the output level (future write amplification). For example:

  1. trivial-task L1 sst[5,6] -> L2 may cause larger write-amplification task (from L2 -> L3)
L1 [5, 6]

l2            [7,8]

l3 [1, 5], [6, 7]
  1. normal-task: L1[4, 6] -> L2 may cause larger write-amplification task (from L2 -> L3)
    • task 1 will produce sst L2[4, 6, 17, 19] If sst [8, 9] contains a large amount of data in L3, the next compact task will cause larger write amplification
    • If sst is reorganized into L2 [4, 6] [17, 19], the next compact task will not contain [8, 9] to alleviate write amplification.
L1 [4, 17]

l2      [6, 19]

l3 [1, 5], [8, 9], [17, 19]

Fortunately, due to the characteristics of level compaction, non-L0 sst will only move to the Ln + 1 layer. Therefore, we can judge the impact of SST on future compact task write amplification based on the data distribution of the Ln + 2. And use these characteristics to organize sst and reduce future write amplification.

github-actions[bot] commented 3 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.