Open hzxa21 opened 2 weeks ago
The general idea LGTM. I think we need some more detailed design to ensure that the data in the log store can converge to 0
The work has overlap with unaligned join (log store executor can be used for both). Will write a design doc for this.
Actually why just during backfill? Shouldn't sink_decouple
always let the downstream sink be decoupled from upstream?
Actually why just during backfill? Shouldn't
sink_decouple
always let the downstream sink be decoupled from upstream?
Outside of the backfilling period, the downstream MV will wait for the upstream barrier to align, and there is no way to make the downstream progress faster.
Actually why just during backfill? Shouldn't
sink_decouple
always let the downstream sink be decoupled from upstream?Outside of the backfilling period, the downstream MV will wait for the upstream barrier to align, and there is no way to make the downstream progress faster.
Why? If we use kv_log_store, it will just buffer the changes, and barrier can go pass once these changes have been written to the logstore.
Actually why just during backfill? Shouldn't
sink_decouple
always let the downstream sink be decoupled from upstream?Outside of the backfilling period, the downstream MV will wait for the upstream barrier to align, and there is no way to make the downstream progress faster.
Why? If we use kv_log_store, it will just buffer the changes, and barrier can go pass once these changes have been written to the logstore.
Is your feature request related to a problem? Please describe.
Backfilling can backpressure upstream, causing the existing streaming jobs to be slower or even stuck. There are three cases where backfilling can happen:
The current way to mitigate backfilling effect on upstream
SET BACKFILL_RATE_LIMIT to xxx
. Supported for 1, 2, 3.SET sink_decouple to true
(default on). Supported for 2.SET streaming_use_snapshot_backfill to true
(default off, experimental now). Supported for 1.The only effective way for 3 is use rate limit, which requires manual operation and understanding on the workload before determining a good value. Therefore, I think we should also support sink decoupling for sink into table as well. This is also a perquisite of doing severless backfill for sink into table.
Describe the solution you'd like
There are two ways to implement sink decoupling for sink into table:
Describe alternatives you've considered
No response
Additional context
No response