Open fuyufjh opened 1 year ago
Summarize some discussion internally / offline: There are 3 levels of fixes from simple to more complex, and increasing generality:
Is this completed?
Is this completed?
With WatermarkCache
I think the underlying issue is somewhat mitigated: https://github.com/risingwavelabs/risingwave/issues/11320.
Mentioning the slack discussion here for further context: https://risingwave-labs.slack.com/archives/C04NK8HD44R/p1690340513855229?thread_ts=1690331890.873379&cid=C04NK8HD44R.
Still planned but recently didn't work on it.
This issue has been open for 60 days with no activity.
If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity
label.
You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄
Is your feature request related to a problem? Please describe.
See https://github.com/risingwavelabs/rfcs/blob/main/rfcs/0033-dynamic-filter.md
Currently, the outer side rows are not cached, which results in a table scan on every barrier.
As we discussed today, once the duration of processing a barrier exceeds barrier frequency, the whole streaming graph will be completely filled with barrier and not actual data can be processed. Without caching, This can easily happen.
Describe the solution you'd like
Add cache for outer-side rows.
A very primitive idea is that we can differs the cases of monotonically increasing variable (such as
NOW()
fromNowExecutor
) and others. For monotonically increasing variables, only the larger values than current need to be cached. Otherwise, values around current need to be cached because the value can either go up or down.The caching policy seems to be complicated. May need an additional RFC for this.
Describe alternatives you've considered
No response
Additional context
No response