risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.05k stars 579 forks source link

Further optimization of WriteLimit conditions #10974

Open Li0k opened 1 year ago

Li0k commented 1 year ago

Is your feature request related to a problem? Please describe.

In Hummock, excessive sst can lead to read performance degradation and deteriorate compactor performance, so we introduced the concept of WriteLimiter to implement a Write stall for excessive writes.

#[derive(Default)]
pub struct WriteLimiter {
    limits: ArcSwap<(
        HashMap<CompactionGroupId, WriteLimit>,
        HashMap<TableId, CompactionGroupId>,
    )>,
    notify: tokio::sync::Notify,
}

The system will use wait_permission to determine if the flush condition is satisfied at each flush. When the write limit is in effect, it will cause a write stall in Hummock and affect the upstream operators through backpressure, thus relieving the pressure of sst pileup in Hummock.

In the current implementation, we only use level0_stop_write_threshold_sub_level_number as the write limit condition, and the default value is 1000, which is quite lenient. In some recent scenarios, we have found that this limit has a certain lag time and cannot limit writes in time, leading to further deterioration of the lsm state.

Describe the solution you'd like

The purpose of introducing WriteLimit is to prevent the shape of Lsm from becoming abnormal, which in turn leads to a reduction in read performance and Compactor efficiency. Therefore, based on the above reasons, we can introduce more constraints:

  1. l0 overlapping sst counts (Each time we read, we need to merge all related overlapping sst files, the more sst files there are, the more pressure on the merge iter)
  2. l0 non-overlapping sub_level counts (For non-overlapping, by prune, one sst per level is selected to be added to the merge iter, and the more level there are, the higher the pressure on the merge iter)
  3. (Optional) lsm_pending_bytes (An increase in lsm_pending_bytes indicates that the compactor's capacity is insufficient, and further writes will result in an abnormal lsm shape.)
  4. (Optional) base level size / sst_count (This is an indirect reflection of the l0 anomaly, base level anomaly will lead to base level compaction task anomaly, further aggravate the l0 stacking)

Describe alternatives you've considered

No response

Additional context

No response

MrCroxx commented 1 year ago

How will you implement write stall? By adding lags?

Li0k commented 1 year ago

How will you implement write stall? By adding lags?

We have already implemented WriteLimit, and we can achieve write stall by blocking flush, this issue wants to further discuss its limitations.

fuyufjh commented 1 year ago

@Li0k Any updates?

Li0k commented 1 year ago

@Li0k Any updates?

No, I'm considering to suspending this pr

hzxa21 commented 10 months ago

I think we can bring this issue back and see how we can optimize the write limit conditions. We have seen several cases when writes are not stalled even though there are many overlapping sub-levels. This can happen when to base compaction and tier compaction are stuck while intra-L0 compaction can still proceed.

github-actions[bot] commented 4 months ago

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄