Element-wise matrix multiplication should be vectorized/parallelized

japaric-archived / linalg.rs

[INACTIVE]

Apache License 2.0

29 stars 1 forks source link

Element-wise matrix multiplication should be vectorized/parallelized #72

Open japaric opened 9 years ago

japaric commented 9 years ago

There is no BLAS routine for this operation, and right now it's implemented as a single-threaded for loop.

At the very least the operation should be SIMD accelerated, and perhaps multi-threaded for "big" inputs.

japaric commented 9 years ago

yeppp has C routines for this operation.

vks commented 9 years ago

Doesn't a decent BLAS/LAPACK implementation use SIMD?

vks commented 9 years ago

Just looked it up: OpenBLAS and ATLAS support vectorization and multi-threading.

japaric commented 9 years ago

@vks By Mat *= Mat I meant element-wise matrix multiplication. Do BLAS libraries provide a routine for that?

vks commented 9 years ago

It seems like only MKL supports it. There are workarounds, see http://stackoverflow.com/questions/7621520/element-wise-vector-vector-multiplication-in-blas .

On Fri, May 15, 2015, 04:40 Jorge Aparicio notifications@github.com wrote:

@vks https://github.com/vks By Mat = Mat I meant *element-wise matrix multiplication. Do BLAS libraries provide a routine for that?

— Reply to this email directly or view it on GitHub https://github.com/japaric/linalg.rs/issues/72#issuecomment-102230360.

japaric commented 9 years ago

@vks The BLAS trick is interesting, it does more operations per element than the current implementation, but because the former is vectorized and multithreaded it will likely result in faster execution times for sufficiently large inputs. I think it would also be possible to use it to evaluate the expression alpha * A % B + beta * C (where % denotes element-wise multiplication).

re MKL, we could use it but put it behind an opt-in cargo feature, but I'll like to focus on using standard BLAS routines for the time being.