risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.04k stars 579 forks source link

Investigate what is the actual bottleneck in hash agg processing for dirty groups #18748

Open kwannoel opened 1 month ago

kwannoel commented 1 month ago

It doesn't seem to be heap or cpu bottleneck. So what is the actual bottleneck, is it IO cost, due to lookups? If so we need a metric for it.

Or is it skew? because in some scenarios, the workload peaks at 1600%. But we have 32 cores.

Needs further investigation.

kwannoel commented 1 month ago

Some workloads to test:

  1. What happens when a large number of existing agg groups get updated.
  2. What happens when a large number of new agg groups are created.
  3. Does it change according to cache configurations.
  4. Test first_value agg.
  5. Make sure to use minio rate limit configuration, to simulate latency when fetching from aws s3.

Measurements:

  1. CPU use.
  2. Heap use.
  3. Cache Miss.
  4. Actor Idle.