risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.8k stars 564 forks source link

batch(hash join): remove the `filter(|(start_row_id, end_row_id)| start_row_id < end_row_id)` when process `first_output_row_ids` #15854

Open st1page opened 6 months ago

st1page commented 6 months ago

In BatchHashJoin we have many logic like https://github.com/risingwavelabs/risingwave/blob/75ebd1ea43b211b16321b444b6d47c0ed039a539/src/batch/src/executor/join/hash_join.rs#L1580-L1584

But I guess it is not needed and it just to tolerate some wrongly inserted row_id such as https://github.com/risingwavelabs/risingwave/blob/75ebd1ea43b211b16321b444b6d47c0ed039a539/src/batch/src/executor/join/hash_join.rs#L497-L508 The push logic should be inside the matching branch(if let Some(first_matched_build_row_id) = hash_map.get(probe_key) {)

I think we need to fix it and remove the filter to prevent more potential issues.

github-actions[bot] commented 1 month ago

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄