risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.06k stars 581 forks source link

fix: fix potential data loss for shared source #19443

Closed xxchan closed 2 days ago

xxchan commented 3 days ago

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Set up: Create a shared kafka source, and 1 MV on the source.

Data loss happens when:

  1. alter source set rate limit to 0
  2. push some new data to the kafka topic
  3. resume the source. Then we can find the MV don't get the data. If we push more data to the topic, only new data comes.

Reason: In #16626 we introduced an optimization to let shared SourceExecutor start from latest, but the implementation is problematic. Specifically, hack_seek_to_latest will not only take effect at the beginning, but will also when rebuilding the source reader (which happens when rate limit is applied).

The new implementation in this PR:

Checklist

Documentation

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

xxchan commented 3 days ago

This stack of pull requests is managed by Graphite. Learn more about stacking.

tabVersion commented 3 days ago

Specifically, hack_seek_to_latest will not only take effect at the beginning, but will also when rebuilding the source reader (which happens when rate limit is applied).

So when receiving a new mutation on rate_limit, the source exec refreshes the high watermark to hw_1 but the source backfill exec keeps the original high watermark hw_0 as the end position of backfill. Is my understanding correct?

xxchan commented 3 days ago

It's not related with backfill's position. We may assume backfill already finished, and it's just forwarding messages now.

Rebuilding will make source exec jump from hw_0 to hw_1.

xxchan commented 2 days ago

Want to wait a while for more reviews. Just in case.

xxchan commented 2 days ago

Merge activity

xxchan commented 2 days ago

will cherry pick the whole stack together