Li0k commented 1 year ago

We avoid the split of creating table after https://github.com/risingwavelabs/risingwave/pull/11826.

Backfill snapshot read can cause tables to have a large write throughput during mv creation and cause excessive compaction groups to be created. But the streaming throughput can be low after mv creation completes.

Currently, we have not implemented compaction group merge, In the above scenario, it may cause us to waste more IOPS. However, placing high write-throughput tabless from in the default compaction group will not allow to utilization of parallel base compaction to improve the efficiency of the compaction, due to the key range not being aligned.

To solve the data alignment problem, propose a simple solution to improve the parallelism of compaction by performing some data alignment operations on the default compaction group, which may improve the performance of the backfill, reduce the stacking of l0s, and get more efficient compaction.

In the default compaction group, count the table write throughput in the creating phase (logic already exists).
cut the table high throughput by table_id and vnode to achieve data alignment (like the dedicated compaction group)
After backfill, restore the default logic to reduce the iops of the default compaction group.

hzxa21 commented 1 year ago

13075

Li0k commented 1 year ago

Backfill Test

Resource

compute node = 8c_32g * 3
compactor node = 16c_4g * 1

Background

Test the behavior of compaction and backfill under different policies by creating mvs on a mirror cluster, The mvs created contain multiple state tables, and there are a few state tables with high write throughput. Comparing the old and new policies:

During Backfill, we do not split any compaction group and perform compaction in the default compaction group without data alignment.
During Backfill, data alignment is performed in the default compaction group.
main https://grafana.prod.risingwave.cloud/d/EpkBw5W4k/risingwave-dev-dashboard?from=1699416722067&orgId=1&to=1699417335729&var-component=All&var-datasource=P7AEF1E9BD6C1839A&var-instance=risingwave&var-namespace=rwc-g1hc96c0vserua5v7ohnjr7ash-monocle-troubleshooting&var-pod=All&var-table=All
branch https://grafana.prod.risingwave.cloud/d/EpkBw5W4k/risingwave-dev-dashboard?from=1699527970000&orgId=1&to=1699528590000&var-component=All&var-datasource=P7AEF1E9BD6C1839A&var-instance=risingwave&var-namespace=rwc-g1hc96c0vserua5v7ohnjr7ash-monocle-troubleshooting&var-pod=All&var-table=All

Result

CPU

main
branch

The compactor cpu utilization of branch has increased, which indirectly indicates an increase in parallelism.

Barrier Latency

main
branch

Read Duration - iter

main
branch

SStable Count

main
branch

SStable Size

main
branch

cg2 and cg3 have less stacked l0 and base level data.

cg2 75g vs 65g
cg3 32g vs 24g

Compaction Skip Count

main
branch

cg2 and cg3 have fewer skip counts due to pending-files

Compaction Task

main
branch

From the analysis of CompactTask's properties, we can find that the branch's task can eliminate more sub_levels, and the size of each task is controlled to be around 2g, and the number of files is kept below 100. Therefore, we can maintain a stable running task count and improve the compactor cpu utilization.

Compacting Task count

main
branch

It is intuitively obvious that the branch's base level compaction task has a higher parallelism.

Lsm Compact Pending Bytes

main
branch

Conclusion

Data alignment does bring some compaction benefits. It improves compactor utilization and therefore alleviates data buildup in lsm. However, in the current tests, the short backfill times do not result in significant time optimization, and the barrier latency is somewhat jittery due to more frequent compactions.

Li0k commented 7 months ago

related to https://github.com/risingwavelabs/risingwave/issues/15291 , We will introduce new strategies to perform data alignment and split.

github-actions[bot] commented 5 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

risingwavelabs / risingwave

perf(storage): Improve data alignment for multi-table compaction groups #13037