risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.04k stars 579 forks source link

perf(storage): Improve data alignment for multi-table compaction groups #13037

Open Li0k opened 1 year ago

Li0k commented 1 year ago

We avoid the split of creating table after https://github.com/risingwavelabs/risingwave/pull/11826.

Backfill snapshot read can cause tables to have a large write throughput during mv creation and cause excessive compaction groups to be created. But the streaming throughput can be low after mv creation completes.

Currently, we have not implemented compaction group merge, In the above scenario, it may cause us to waste more IOPS. However, placing high write-throughput tabless from in the default compaction group will not allow to utilization of parallel base compaction to improve the efficiency of the compaction, due to the key range not being aligned.

To solve the data alignment problem, propose a simple solution to improve the parallelism of compaction by performing some data alignment operations on the default compaction group, which may improve the performance of the backfill, reduce the stacking of l0s, and get more efficient compaction.

  1. In the default compaction group, count the table write throughput in the creating phase (logic already exists).
  2. cut the table high throughput by table_id and vnode to achieve data alignment (like the dedicated compaction group)
  3. After backfill, restore the default logic to reduce the iops of the default compaction group.
hzxa21 commented 1 year ago

13075

Li0k commented 1 year ago

Backfill Test

Resource

Background

Test the behavior of compaction and backfill under different policies by creating mvs on a mirror cluster, The mvs created contain multiple state tables, and there are a few state tables with high write throughput. Comparing the old and new policies:

Result

CPU

The compactor cpu utilization of branch has increased, which indirectly indicates an increase in parallelism.

Barrier Latency

Read Duration - iter

SStable Count

SStable Size

cg2 and cg3 have less stacked l0 and base level data.

Compaction Skip Count

cg2 and cg3 have fewer skip counts due to pending-files

Compaction Task

From the analysis of CompactTask's properties, we can find that the branch's task can eliminate more sub_levels, and the size of each task is controlled to be around 2g, and the number of files is kept below 100. Therefore, we can maintain a stable running task count and improve the compactor cpu utilization.

Compacting Task count

It is intuitively obvious that the branch's base level compaction task has a higher parallelism.

Lsm Compact Pending Bytes

Conclusion

Data alignment does bring some compaction benefits. It improves compactor utilization and therefore alleviates data buildup in lsm. However, in the current tests, the short backfill times do not result in significant time optimization, and the barrier latency is somewhat jittery due to more frequent compactions.

Li0k commented 7 months ago

related to https://github.com/risingwavelabs/risingwave/issues/15291 , We will introduce new strategies to perform data alignment and split.

github-actions[bot] commented 5 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.