Open cyliu0 opened 1 month ago
For nexmark-q7-rewrite-blackhole-4x-medium-1cn-affinity
, the tpch-q8
could be due to other issues
It seems because of the same issue with https://github.com/risingwavelabs/risingwave/issues/15142
There is greater Imbalance in 5.18.
5.12:
5.18:
cc @lmatz
No conclusion has been reached regarding the reason for the performance degradation of TPCH Q8. The current phenomena:
rerun 0518 (slow): https://buildkite.com/risingwave-test/tpch-benchmark/builds/1075 test 0514 (fast): https://buildkite.com/risingwave-test/tpch-benchmark/builds/1076 test 0515 ( ) https://buildkite.com/risingwave-test/tpch-benchmark/builds/1077 test 0516 ( ) https://buildkite.com/risingwave-test/tpch-benchmark/builds/1078
For nexmark q7, the network bandwidth between RW and Kafka is not the same:
Previously:
This time:
https://buildkite.com/risingwave-test/nexmark-benchmark/builds/3708
Name: benchmark-kafka-0
--
| Namespace: nexmark-ht-4x-1cn-affinity-weekly-20240518
| Command:
| /scripts/setup.sh
| State: Running
| Started: Sat, 18 May 2024 17:04:06 +0000
| Ready: True
| Restart Count: 0
| Limits:
| cpu: 8
| memory: 13Gi
| Requests:
| cpu: 7
| memory: 13Gi
Hmmm, there is a slight chance that Kafka is not enough, although Kafka should be I/O bound instead of CPU bound
or because the machine is not large enough and only an "unstable" "up to 12.5Gbps" bandwidth can be achieved, see https://github.com/risingwavelabs/risingwave/issues/15142#issuecomment-1956060259
No conclusion has been reached regarding the reason for the performance degradation of TPCH Q8. The current phenomena:
- From the perspective of backpressure, the bottleneck occurs at a certain append-only hash join
- Both operator cache and block cache show higher miss rates on the 18th compared to the 12th.
rerun 0518 (slow): https://buildkite.com/risingwave-test/tpch-benchmark/builds/1075 test 0514 (fast): https://buildkite.com/risingwave-test/tpch-benchmark/builds/1076 test 0515 ( ) https://buildkite.com/risingwave-test/tpch-benchmark/builds/1077 test 0516 ( ) https://buildkite.com/risingwave-test/tpch-benchmark/builds/1078
Overall, there has been some fluctuation in the performance of the images on the 15th and 16th, but I believe the main performance drop is due to a change on the nightly-20240517
. https://github.com/risingwavelabs/rw-commits-history?tab=readme-ov-file#nightly-20240517
No conclusion has been reached regarding the reason for the performance degradation of TPCH Q8. The current phenomena:
- From the perspective of backpressure, the bottleneck occurs at a certain append-only hash join
- Both operator cache and block cache show higher miss rates on the 18th compared to the 12th.
rerun 0518 (slow): https://buildkite.com/risingwave-test/tpch-benchmark/builds/1075 test 0514 (fast): https://buildkite.com/risingwave-test/tpch-benchmark/builds/1076 test 0515 ( ) https://buildkite.com/risingwave-test/tpch-benchmark/builds/1077 test 0516 ( ) https://buildkite.com/risingwave-test/tpch-benchmark/builds/1078
Overall, there has been some fluctuation in the performance of the images on the 15th and 16th, but I believe the main performance drop is due to a change on the
nightly-20240517
. https://github.com/risingwavelabs/rw-commits-history?tab=readme-ov-file#nightly-20240517
Unfortunately, it appears randomly... The degradation can happen in 0516's image but not happen in 0517's image... http://metabase.risingwave-cloud.xyz/question/4966-tpch-q8-bs-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-371?start_date=2024-05-20
The unstable degradation of q8 happens on nightly-20240512 too
Describe the bug
Perf degrdation in weekly test. Those two SKUs runs only in weekly test now.
https://buildkite.com/risingwave-test/tpch-benchmark/builds/1074
https://buildkite.com/risingwave-test/nexmark-benchmark/builds/3708
http://metabase.risingwave-cloud.xyz/question/4966-tpch-q8-bs-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-371?start_date=2023-09-24
http://metabase.risingwave-cloud.xyz/question/9112-nexmark-q7-rewrite-blackhole-4x-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-2808?start_date=2023-12-21
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
No response