Closed xxchan closed 2 days ago
main
This stack of pull requests is managed by Graphite. Learn more about stacking.
Specifically, hack_seek_to_latest will not only take effect at the beginning, but will also when rebuilding the source reader (which happens when rate limit is applied).
So when receiving a new mutation on rate_limit, the source exec refreshes the high watermark to hw_1
but the source backfill exec keeps the original high watermark hw_0
as the end position of backfill. Is my understanding correct?
It's not related with backfill's position. We may assume backfill already finished, and it's just forwarding messages now.
Rebuilding will make source exec jump from hw_0
to hw_1
.
Want to wait a while for more reviews. Just in case.
will cherry pick the whole stack together
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Set up: Create a shared kafka source, and 1 MV on the source.
Data loss happens when:
Reason: In #16626 we introduced an optimization to let shared SourceExecutor start from latest, but the implementation is problematic. Specifically,
hack_seek_to_latest
will not only take effect at the beginning, but will also when rebuilding the source reader (which happens when rate limit is applied).The new implementation in this PR:
hack_seek_to_latest
flag, which is error prone.seek_to_latest
call. At the same time, we also get the latest offsets. To make sureSplitImpl
andSourceReader
is consistent.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.