kokkos / stdBLAS

Reference Implementation for stdBLAS
Other
117 stars 21 forks source link

Rank-k and Rank-2k preconditions are wrong and other problems #274

Open rasolca opened 2 months ago

rasolca commented 2 months ago

Preconditions rank-k updates written in the proposal document are wrong as A must NOT be a square matrix: https://eel.is/c++draft/linalg#algs.blas3.rankk-3.2 https://eel.is/c++draft/linalg#algs.blas3.rankk-4.1

Similarly, preconditions of rank-2k are wrong. https://eel.is/c++draft/linalg#algs.blas3.rank2k-4.1 and https://eel.is/c++draft/linalg#algs.blas3.rank2k-4.1 should read multipliable instead of addable,

https://eel.is/c++draft/linalg#algs.blas3.rank2k-3.3 and https://eel.is/c++draft/linalg#algs.blas3.rank2k-4.2 C should be the square matrix, not A.

Which is the correct way to ask for fixing those?

mhoemmen commented 2 months ago

Thanks for reporting! We'll take a look as soon as we can.

rasolca commented 1 month ago

I had an extra look to the proposal and found other problems. I summarize all the problems I found here:

GEMV

algorithm operation matrix sizes complexity
algs.blas2.gemv y=Ax A(m,n), x(m), y(n) O(m n)
wrong statement correct statement link
possibly-addable<decltype(x), decltype(y), decltype(z)>() is true possibly-addable<decltype(y), decltype(y), decltype(z)>() is true 3.2
addable(x,y,z) is true addable(y,y,z) is true 4.2

RANKK

algorithm operation matrix sizes complexity
algs.blas3.rankk C <- C + A A^H C(n,n), A(n,k) O(n n k)
wrong statement correct statement link
compatible-static-extents<decltype(A), decltype(A)>(0, 1) is true <delete> 3.2
A.extent(0) equals A.extent(1) <delete> 4.1

RANK2K

algorithm operation matrix sizes complexity
algs.blas3.rank2k C <- C + A B^H + B A^H C(n,n), A(n,k), B(n,k) O(n n k)
wrong statement correct statement link
possibly-addable<decltype(A), decltype(B), decltype(C)>() is true possibly-multipliable<decltype(C), decltype(A), decltype(B)>() is true 3.2
compatible-static-extents<decltype(A), decltype(A)>(0, 1) is true compatible-static-extents<decltype(C), decltype(C)>(0, 1) is true 3.3
addable(A, B, C) is true multipliable(C, A, B) is true 4.1
A.extent(0) equals A.extent(1) C.extent(0) equals C.extent(1) 4.2

TRMM in-place

algorithm operation matrix sizes complexity
~linalg.algs.blas3.inplacetrsm (left)~
linalg.algs.blas3.trmm (left)
~B = A X~
C <- A C
~A(n,n), X(m,n) B(m,n)~
A(m,m), C(m,n)
O(m m n)
wrong statement correct statement link
O(A.extent(0)×A.extent(1)×C.extent(0)) O(A.extent(0)×A.extent(1)×C.extent(1)) 5

TRSM

algorithm operation matrix sizes complexity
linalg.algs.blas3.trsm (left) Solve A X = B A(m,m), X(m,n) B(m,n) O(m m n)
wrong statement correct statement link
O(A.extent(0)×X.extent(1)×X.extent(1)) O(A.extent(0)×X.extent(0)×X.extent(1)) 6

TRSM in-place

algorithm operation matrix sizes complexity
linalg.algs.blas3.inplacetrsm (right) Solve X A = B and store X in B A(n,n), X(m,n) B(m,n) O(m n n)
wrong statement correct statement link
O(A.extent(0)×A.extent(1)×B.extent(1)) O(B.extent(0)×A.extent(0)×A.extent(1)) 13
mhoemmen commented 1 month ago

Hi @rasolca ! Thanks for reporting this! We plan to report this as an LWG issue. If you would like credit for this in the public record, please contact me offline, using my e-mail address on the proposal. Thanks!

[linalg.algs.blas2.gemv]

I agree with your suggestion.

[linalg.algs.blas3.rankk] and [linalg.algs.blas3.rank2k]

I agree.

[linalg.algs.blas3.inplacetrsm]

Hold on, let me look at this (will update)

[linalg.algs.blas3.trsm] 6

I agree. (Note to self: This refers only to the left solve. The complexity of the right solve in (13) looks OK.)

[linalg.algs.blas3.inplacetrsm] 13

I agree. (Note to self: This refers only to the right solve. The complexity of the left solve in (6) looks OK.)

rasolca commented 1 month ago

Sorry, for the third case (the one you said you will have a look), I made a couple of copy-paste typos. I updated the table above.

mhoemmen commented 1 month ago

@rasolca Thanks for the update! Here are my updated corrections: https://github.com/ORNL/cpp-proposals-pub/issues/464#issuecomment-2237118658 .

mhoemmen commented 3 weeks ago

@rasolca Thanks so much for your help! I've submitted an LWG issue for these: https://cplusplus.github.io/LWG/issue4137 . Please let me know if there are any further issues; thanks! : - )

Please see also https://isocpp.org/files/papers/P3371R0.html , "Fix C++26 by making the symmetric and Hermitian rank-k and rank-2k updates consistent with the BLAS," which will come out in the next mailing.