Tracking: improve scaling up performance

lmatz commented 8 months ago

The dashboard includes RW's 1cn baseline, 1cn (4X resources), 4cn (each cn 1X resource) and other systems: http://metabase.risingwave-cloud.xyz/question/9549-nexmark-rw-vs-flink-avg-source-throughput-all-testbeds?rw_tag=nightly-20240127&flink_tag=v1.16.0&flink_label=flink-medium-1tm-test-20230104,flink-4x-medium-1tm-test-20240104&flink_metrics=avg-job-throughput-per-second

To access the dashboard, please refer to: https://www.notion.so/Performance-Test-Dashboard-Manual-e33b26eb188e48379a7b714a01a4fc2c

4X 1cn performance tests are executed weekly.

[ ] #14986
[ ] #15251
[x] #14985
[x] #14987
[x] #15004
[ ] q0, q1, q2, q3, q22, q7-rewrite (stateless queries, or throughput very close to stateless queries)

Improvements needed:

[ ] #14815 will improve a few stateless and computation-lightweight queries, e.g. q0, q1, q2, q3, q22, q7-rewrite
[ ] #15705 and #15731 will improve queries that have aggregation and join with intensive hash computation, e.g. q15, q16
[x] #15419 enable more shards for cache to reduce the lock contention
[ ] ......

lmatz commented 7 months ago

One common phenomenon is that three metrics of join on Grafana are all much worse when scaling up: q8: https://github.com/risingwavelabs/risingwave/issues/14986#issuecomment-1925823694 q101: https://github.com/risingwavelabs/risingwave/issues/14987#issuecomment-1925843780 q102: https://github.com/risingwavelabs/risingwave/issues/15004#issuecomment-1926902368

Join Executor Barrie Align Grafana: https://github.com/risingwavelabs/risingwave/blob/v1.6.1/grafana/risingwave-dev-dashboard.dashboard.py#L1194-L1211 Code: https://github.com/risingwavelabs/risingwave/blob/v1.6.1/src/stream/src/executor/barrier_align.rs#L116-L141
Join Actor Input Blocking Ratio: Grafana: https://github.com/risingwavelabs/risingwave/blob/v1.6.1/grafana/risingwave-dev-dashboard.dashboard.py#L1212-L1225 Code: https://github.com/risingwavelabs/risingwave/blob/v1.6.1/src/stream/src/executor/hash_join.rs#L733
Join Actor Match Duration Per Second Grafana: https://github.com/risingwavelabs/risingwave/blob/v1.6.1/grafana/risingwave-dev-dashboard.dashboard.py#L1226-L1238 Code: https://github.com/risingwavelabs/risingwave/blob/v1.6.1/src/stream/src/executor/hash_join.rs#L770 https://github.com/risingwavelabs/risingwave/blob/v1.6.1/src/stream/src/executor/hash_join.rs#L794 https://github.com/risingwavelabs/risingwave/blob/v1.6.1/src/stream/src/executor/hash_join.rs#L824-L825

It turns out that these metrics are summing over all the actors that belong to the same fragment, which seems meaningless.

lmatz commented 6 months ago

Query q15, q16 and q17 are similar but different: https://github.com/risingwavelabs/kube-bench/blob/main/manifests/nexmark/nexmark-sinks.template.yaml#L700C5-L713C87

Q17's plan:

 StreamSink { type: append-only, columns: [auction, day, total_bids, rank1_bids, rank2_bids, rank3_bids, min_price, max_price, avg_price, sum_price] }
 └─StreamProject { exprs: [$expr3, $expr2, count, count filter(($expr4 < 10000:Int32)), count filter(($expr4 >= 10000:Int32) AND ($expr4 < 1000000:Int32)), count filter(($expr4 >= 1000000:Int32)), min($expr4), max($expr4), (sum($expr4) / count($expr4)::Decimal) as $expr5, sum($expr4)] }
   └─StreamHashAgg [append_only] { group_key: [$expr2, $expr3], aggs: [count, count filter(($expr4 < 10000:Int32)), count filter(($expr4 >= 10000:Int32) AND ($expr4 < 1000000:Int32)), count filter(($expr4 >= 1000000:Int32)), min($expr4), max($expr4), sum($expr4), count($expr4)] }
     └─StreamExchange { dist: HashShard($expr2, $expr3) }
       └─StreamProject { exprs: [ToChar($expr1, 'YYYY-MM-DD':Varchar) as $expr2, Field(bid, 0:Int32) as $expr3, Field(bid, 2:Int32) as $expr4, _row_id] }
         └─StreamFilter { predicate: (event_type = 2:Int32) }
           └─StreamRowIdGen { row_id_index: 6 }
             └─StreamWatermarkFilter { watermark_descs: [Desc { column: $expr1, expr: ($expr1 - '00:00:04':Interval) }], output_watermarks: [$expr1] }
               └─StreamProject { exprs: [event_type, person, auction, bid, Case((event_type = 0:Int32), Field(person, 6:Int32), (event_type = 1:Int32), Field(auction, 5:Int32), Field(bid, 5:Int32)) as $expr1, _rw_kafka_timestamp, _row_id] }
                 └─StreamSource { source: nexmark, columns: [event_type, person, auction, bid, _rw_kafka_timestamp, _row_id] }
(10 rows)

Although q15 and q16 do not scale well at the moment, q17 DOES scale quite well.

Can refer to the peak number at http://metabase.risingwave-cloud.xyz/question/9270-nexmark-q17-blackhole-4x-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-2767?start_date=2024-01-04

Can also check Flink's number at http://metabase.risingwave-cloud.xyz/question/9732-flink-nexmark-q17-flink-4x-medium-1tm-avg-job-throughput-per-second-records-s-history-thtb-2922?start_date=2023-12-05

Both 4X above.

Reasons: https://github.com/risingwavelabs/risingwave/issues/15705 and https://github.com/risingwavelabs/risingwave/issues/15731

github-actions[bot] commented 3 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

risingwavelabs / risingwave

Tracking: improve scaling up performance #14448