sarah-quinones / faer-rs

Linear algebra foundation for the Rust programming language
https://faer-rs.github.io
MIT License
1.82k stars 61 forks source link

Gemm slow on specific AMD epyic CPU #27

Closed ZuseZ4 closed 1 year ago

ZuseZ4 commented 1 year ago

Describe the bug Running the faer-bench benchmark on 2xAMD EPYC 7V13 64-Core Processor is surprisingly slow. Both in absolute numbers, as well as in comparing the speedup of faer(par) over faer(seq). This does not seem to be the case on other, larger AMD server cpu's

To Reproduce Just to keep track for myself: CXXFLAGS="-I/u/drehwald/prog" CXX=g++ cargo +nightly run --release --no-default-features --features faer

Expected behavior Well, don't be slow.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

➜  ~ g++ --version                                                                                            
g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

➜  ~ cargo +nightly --version
cargo 1.71.0-nightly (d0a4cbcee 2023-04-16)

Additional context

Our admin just got back to me, university admins probably won't adjust the perf settings for us, the machine is too busy so it would be a perf risk. But I got access to two other AMD machines, maybe we can use that for pinning the issue down.

ZuseZ4 commented 1 year ago

solved by https://github.com/sarah-ek/gemm/commit/ab51c49534176f8a2823a01fd1d24b2f2aa8ccc7

ZuseZ4 commented 1 year ago
f64

## Matrix multiplication

Multiplication of two square matrices of dimension `n`.

    n       faer  faer(par)    ndarray   nalgebra      eigen
 8192     27.09s      1.06s    926.3ms          -      2.14s