risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.06k stars 581 forks source link

Enhance(compaction): Optimize the compaction task based on the state of the LSM #19531

Open Li0k opened 13 hours ago

Li0k commented 13 hours ago

Hummock has an almost fixed “Rule” for picking compaction tasks, and only violates the “Rule” in extreme cases (write-stop), which is reliable in most cases.

However, it is not always efficient. Hummock's Compaction Rule is write-friendly, for example

In extreme cases, the above rules are not efficient, such as

In fact, the above scenarios can be divided into read and write, but the problem of read operation is affected by more factors, such as operator cache, block cache, file cache and cache refill/evict policy. Therefore, I would like to focus on optimizing the write problem first.

For the write scenario, we can optimize the known case by selecting the task through a Rule. 1.

  1. Optimize the task's output sst parition by Write throughput (already implemented).
  2. adjust the selection rule and output of the L0 task by the number of L0 stacks.
    • More aggressive batch parameters, select more level count / sst count / max compaction size. image
    • Adjust the output sst size to reduce the sst count, and reduce the pressure on the meta. image
  3. optimize trivial-move commits, commit more trivial-move tasks in one commit.