Open japaric opened 9 years ago
Doesn't a decent BLAS/LAPACK implementation use SIMD?
Just looked it up: OpenBLAS and ATLAS support vectorization and multi-threading.
@vks By Mat *= Mat
I meant element-wise matrix multiplication. Do BLAS libraries provide a routine for that?
It seems like only MKL supports it. There are workarounds, see http://stackoverflow.com/questions/7621520/element-wise-vector-vector-multiplication-in-blas .
On Fri, May 15, 2015, 04:40 Jorge Aparicio notifications@github.com wrote:
@vks https://github.com/vks By Mat = Mat I meant *element-wise matrix multiplication. Do BLAS libraries provide a routine for that?
— Reply to this email directly or view it on GitHub https://github.com/japaric/linalg.rs/issues/72#issuecomment-102230360.
@vks The BLAS trick is interesting, it does more operations per element than the current implementation, but because the former is vectorized and multithreaded it will likely result in faster execution times for sufficiently large inputs. I think it would also be possible to use it to evaluate the expression alpha * A % B + beta * C
(where %
denotes element-wise multiplication).
re MKL, we could use it but put it behind an opt-in cargo feature, but I'll like to focus on using standard BLAS routines for the time being.
There is no BLAS routine for this operation, and right now it's implemented as a single-threaded for loop.
At the very least the operation should be SIMD accelerated, and perhaps multi-threaded for "big" inputs.