Still maybe this is useful to expose directly here so that xrt-smi isn't required in the env.
Using the added test I got these results:
Default
-----------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------
BM_matmul_64x64_64xbf16_/process_time/real_time 1.61 ms 0.671 ms 456 items_per_second=620.987/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.56 ms 0.641 ms 456 items_per_second=643.073/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.59 ms 0.648 ms 456 items_per_second=630.323/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.62 ms 0.653 ms 456 items_per_second=616.069/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.59 ms 0.646 ms 456 items_per_second=629.755/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.57 ms 0.644 ms 456 items_per_second=635.695/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.58 ms 0.641 ms 456 items_per_second=633.842/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.57 ms 0.639 ms 456 items_per_second=636.084/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.59 ms 0.642 ms 456 items_per_second=630.571/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.58 ms 0.648 ms 456 items_per_second=633/s
BM_matmul_64x64_64xbf16_/process_time/real_time_mean 1.59 ms 0.648 ms 10 items_per_second=630.94/s
BM_matmul_64x64_64xbf16_/process_time/real_time_median 1.58 ms 0.645 ms 10 items_per_second=631.786/s
BM_matmul_64x64_64xbf16_/process_time/real_time_stddev 0.019 ms 0.009 ms 10 items_per_second=7.68176/s
BM_matmul_64x64_64xbf16_/process_time/real_time_cv 1.23 % 1.42 % 10 items_per_second=1.22%
Turbo
-----------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------
BM_matmul_64x64_64xbf16_/process_time/real_time 1.57 ms 0.652 ms 433 items_per_second=638.857/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.55 ms 0.651 ms 433 items_per_second=644.931/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.57 ms 0.650 ms 433 items_per_second=638.939/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.57 ms 0.644 ms 433 items_per_second=638.037/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.57 ms 0.664 ms 433 items_per_second=635.318/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.58 ms 0.663 ms 433 items_per_second=631.421/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.54 ms 0.648 ms 433 items_per_second=650.474/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.54 ms 0.646 ms 433 items_per_second=649.22/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.56 ms 0.669 ms 433 items_per_second=642.177/s
BM_matmul_64x64_64xbf16_/process_time/real_time 1.60 ms 0.660 ms 433 items_per_second=623.584/s
BM_matmul_64x64_64xbf16_/process_time/real_time_mean 1.56 ms 0.655 ms 10 items_per_second=639.296/s
BM_matmul_64x64_64xbf16_/process_time/real_time_median 1.57 ms 0.652 ms 10 items_per_second=638.898/s
BM_matmul_64x64_64xbf16_/process_time/real_time_stddev 0.020 ms 0.009 ms 10 items_per_second=8.09723/s
BM_matmul_64x64_64xbf16_/process_time/real_time_cv 1.27 % 1.31 % 10 items_per_second=1.27%
Higher items_per_second is better (I'm pretty sure?).
So for BM_matmul_64x64_64xbf16_/process_time/real_time_mean we get 630.94/s under default vs. 639.296/s under turbo, but with stddev=8.09723 it's basically the same. So I'm not sure what the effect should be :shrug:.
Note at least one of the things it's doing is enabling/disabling clock gating:
[13486.742867] amdxdna:aie2_pm_set_mode:90: amdxdna 0000:c5:00.1: Changing power mode from 0 to 4
[13486.742869] amdxdna:aie2_pm_clock_gating:27: amdxdna 0000:c5:00.1: Disable clock gating, 1 type(s)
...
[13493.313651] amdxdna:aie2_pm_set_mode:90: amdxdna 0000:c5:00.1: Changing power mode from 4 to 0
[13493.313653] amdxdna:aie2_pm_clock_gating:27: amdxdna 0000:c5:00.1: Enable clock gating, 1 type(s)
(via dmesg).
EDIT:
I did this test with a debug build - maybe in a release build there's a difference 🤷♂️
Notes
sudo
;run_matmul_test.sh
script undersudo
and you have env variables you need to dosudo -E
;xrt-smi
with something likeStill maybe this is useful to expose directly here so that
xrt-smi
isn't required in the env.Using the added test I got these results:
Default
Turbo
Higher
items_per_second
is better (I'm pretty sure?).So for
BM_matmul_64x64_64xbf16_/process_time/real_time_mean
we get630.94/s
underdefault
vs.639.296/s
underturbo
, but withstddev=8.09723
it's basically the same. So I'm not sure what the effect should be :shrug:.Note at least one of the things it's doing is enabling/disabling clock gating:
(via
dmesg
).EDIT:
I did this test with a debug build - maybe in a release build there's a difference 🤷♂️