risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.05k stars 580 forks source link

investigate: any faster ways of `get_hash_values` #15731

Open fuyufjh opened 8 months ago

fuyufjh commented 8 months ago

A few alternative approaches to make it more "vectorized" which I think worth taking a try:

  1. Replace visibility Bitmap with visibility Vec<usize>. This is because for idx in vec is more vectorized i.e. no branches at all so more friendly to CPU pipeline.
  2. Similar to 1 but only do it for low-selectivity cases
  3. Do compaction before exchange, because we know exchange will always produce low-selectivity results.

Originally posted by @fuyufjh in https://github.com/risingwavelabs/risingwave/issues/15696#issuecomment-2002777893

github-actions[bot] commented 5 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.