Compute A B -> C:
We require for BLAS compatibility that: A, B, C are
"weakly" contiguous (stride=1) in their fastest dimension, but it can be
either first or second axis (either rowmajor/"c" or colmajor/"f").
The "normal case" is CblasRowMajor for cblas. Select CblasRowMajor /
CblasColMajor to fit C's memory order.
Apply transpose to A, B as needed if they differ from row major. If C
is CblasColMajor then transpose both A, B (again!)
(Weakly = contiguous with stride=1 on that fastest axis, but stride for the
other axis can be arbitrary large; to differentiate from strictly whole
array contiguous.)
A first commit simplified and corrected the logic, while still using
ndarray's reversed axes. But a further commit simplified it even further, to
a satisfying little function in mat_mul_impl as the final result.
I have kept both states (both commits) because I think the first version is
a useful guide if we would ever go to use plain BLAS instead of CBLAS(?).
With the blas (cblas) interface it supports matrices that adhere to certain criteria. They should be contiguous on one dimension (stride=1).
We glance a little at how numpy does this to try to catch all cases.
Compute A B -> C: We require for BLAS compatibility that: A, B, C are "weakly" contiguous (stride=1) in their fastest dimension, but it can be either first or second axis (either rowmajor/"c" or colmajor/"f").
The "normal case" is CblasRowMajor for cblas. Select CblasRowMajor / CblasColMajor to fit C's memory order.
Apply transpose to A, B as needed if they differ from row major. If C is CblasColMajor then transpose both A, B (again!)
(Weakly = contiguous with stride=1 on that fastest axis, but stride for the other axis can be arbitrary large; to differentiate from strictly whole array contiguous.)
A first commit simplified and corrected the logic, while still using ndarray's reversed axes. But a further commit simplified it even further, to a satisfying little function in
mat_mul_impl
as the final result.I have kept both states (both commits) because I think the first version is a useful guide if we would ever go to use plain BLAS instead of CBLAS(?).
Fixes #1278