Describe the bug
Running the faer-bench benchmark on 2xAMD EPYC 7V13 64-Core Processor
is surprisingly slow. Both in absolute numbers, as well as in comparing the speedup of faer(par) over faer(seq).
This does not seem to be the case on other, larger AMD server cpu's
To Reproduce
Just to keep track for myself:
CXXFLAGS="-I/u/drehwald/prog" CXX=g++ cargo +nightly run --release --no-default-features --features faer
Expected behavior
Well, don't be slow.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
➜ ~ g++ --version
g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
➜ ~ cargo +nightly --version
cargo 1.71.0-nightly (d0a4cbcee 2023-04-16)
Additional context
Our admin just got back to me, university admins probably won't adjust the perf settings for us, the machine is too busy so it would be a perf risk. But I got access to two other AMD machines, maybe we can use that for pinning the issue down.
f64
## Matrix multiplication
Multiplication of two square matrices of dimension `n`.
n faer faer(par) ndarray nalgebra eigen
8192 27.09s 1.06s 926.3ms - 2.14s
Describe the bug Running the faer-bench benchmark on
2xAMD EPYC 7V13 64-Core Processor
is surprisingly slow. Both in absolute numbers, as well as in comparing the speedup of faer(par) over faer(seq). This does not seem to be the case on other, larger AMD server cpu'sTo Reproduce Just to keep track for myself:
CXXFLAGS="-I/u/drehwald/prog" CXX=g++ cargo +nightly run --release --no-default-features --features faer
Expected behavior Well, don't be slow.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
Our admin just got back to me, university admins probably won't adjust the perf settings for us, the machine is too busy so it would be a perf risk. But I got access to two other AMD machines, maybe we can use that for pinning the issue down.