risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.1k stars 585 forks source link

Tracking: improve TPC-H performance #15036

Open lmatz opened 10 months ago

lmatz commented 10 months ago

See performance numbers at https://www.notion.so/risingwave-labs/TPCH-Performance-Numbers-Table-e098ef82884546949333409f0513ada7?pvs=4#8de0bf4bda51444c8381f3b0c10ddfe1

Improvements needed:

lmatz commented 9 months ago

2024-02-22

A temporary summary:

  1. q1 and q6 requires #14815
  2. q17's plan is still suboptimal
  3. ~q4's bottleneck is in LeftSemiJoin~
  4. ~q20's bottleneck is also likely to be LeftSemiJoin as it has 2 LeftSemiJoin out of total 4 joins~

2024-02-26

after some investigation:

q4's bottleneck is in LeftSemiJoin

Changing the barrier of both systems to 10s and using the latest image, the performance is about the same.

q20's bottleneck is also likely to be LeftSemiJoin as it has 2 LeftSemiJoin out of total 4 joins

The bottleneck is in the subquery, see #14797 for details

1 and 2 remains

2024-02-28

q20 reveals a potential problem in cache eviction strategy: #15305

2024-02-29

q4 also has a very similar observation caused by the cache eviction strategy: https://github.com/risingwavelabs/risingwave/issues/14811#issuecomment-1970434260 Q4's LocalStoreIter::{Closure} takes an substantial amount of CPU time. See threads in the issue of Q4.

github-actions[bot] commented 5 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.