risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.8k stars 564 forks source link

feat(storage): maintain some L0 data as memtable #6847

Open Little-Wallace opened 1 year ago

Little-Wallace commented 1 year ago

Is your feature request related to a problem? Please describe.

For some scene, when the flush flow is much small than source throughput, there may be a large number of request to hummock (which may be get or scan), and there would be a large number of small files in L0. It would make our system slow just to merge the read result of iterator.

But it is not necessary, for files in overlapping-level, we would cache the data of it in block-cache and every time we finished a checkpoint-barrier, we only add those files flushed by the CN self to local HummockVersion. So we can merge the data of these files directly rather than merge the read result.

Describe the solution you'd like

In the past time, we would add a new sub-level for every checkpoint-barrier. Now, we would keep using the origin overlapping-level and would not compact it to non-overlapping-level unless the size of it is too large.

For example, we set the size limit of overapping-level 256MB, and every checkpoint-barrier we only flush 32MB data to S3.

Describe alternatives you've considered

Additional context

No response

hzxa21 commented 1 year ago

@Little-Wallace Feel free to reassign if needed.

fuyufjh commented 1 year ago

Seems to be postponed. Removed from milestone first.