Improving performance through matrix operations (?)

uwhpsc-2016 / lectures

Notes, slides, and code from the in-class lectures.

7 stars 21 forks source link

It seems unlikely that this would help for a single matrix multiplication. Think about how the transpose has to be implemented. You have to have the matrix B, and a matrix that's the transpose (call it BT). When you're doing the copying, either you have to access the matrix B row-wise and the matrix BT column-wise, or vice-versa, so that operation is not any more cache-friendly, and also not vectorizable with SIMD.

However, if you know that you will want to access B column-wise many times, it may make sense to take the transpose right away and reuse that copy.

uwhpsc-2016 / lectures

Improving performance through matrix operations (?) #11