rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
https://docs.rs/ndarray/
Apache License 2.0
3.63k stars 307 forks source link

Fix using BLAS for all compatible cases of memory layout #1419

Closed bluss closed 3 months ago

bluss commented 3 months ago

With the blas (cblas) interface it supports matrices that adhere to certain criteria. They should be contiguous on one dimension (stride=1).

We glance a little at how numpy does this to try to catch all cases.

Compute A B -> C: We require for BLAS compatibility that: A, B, C are "weakly" contiguous (stride=1) in their fastest dimension, but it can be either first or second axis (either rowmajor/"c" or colmajor/"f").

The "normal case" is CblasRowMajor for cblas. Select CblasRowMajor / CblasColMajor to fit C's memory order.

Apply transpose to A, B as needed if they differ from row major. If C is CblasColMajor then transpose both A, B (again!)

(Weakly = contiguous with stride=1 on that fastest axis, but stride for the other axis can be arbitrary large; to differentiate from strictly whole array contiguous.)

A first commit simplified and corrected the logic, while still using ndarray's reversed axes. But a further commit simplified it even further, to a satisfying little function in mat_mul_impl as the final result.

I have kept both states (both commits) because I think the first version is a useful guide if we would ever go to use plain BLAS instead of CBLAS(?).

Fixes #1278