This effort should be undertaken only if linking to the optimized library versions investigated issue #66 does not reduce runtimes.
Most calls to matmul() in the computationally intensive kernels are for small matrices. These may not benefit from the compiler's implementation of matmul or linking to and calling BLAS or other library versions.
This effort should be undertaken only if linking to the optimized library versions investigated issue #66 does not reduce runtimes.
Most calls to
matmul()
in the computationally intensive kernels are for small matrices. These may not benefit from the compiler's implementation ofmatmul
or linking to and calling BLAS or other library versions.matmul()
using loops