kokkos / stdBLAS

Reference Implementation for stdBLAS
Other
127 stars 22 forks source link

[ready] kokkos impl: `triangular_matrix_matrix_{left,right}_solve()` #205

Closed mzuzek closed 2 years ago

mzuzek commented 2 years ago

This PR introduces Kokkos implementation for triangular_matrix_matrix_{left,right}_solve() - based on KokkosBLAS::trsm() - and unit tests.

Model

Let me present P1673R7-based TRSM model, so it's easier for us to catch my bugs and track down discrepancies.

Triangles

![triangles](https://user-images.githubusercontent.com/29018159/167638026-0063c517-9016-43c2-9b63-6a4d866cd5ce.png)

Note: I avoid naming rows and columns to abstract out the matrix layout and use row-major indexing on the pictures below so XMxN means X has M rows and N columns, and Ai,j means i-th row and j-th column.

TRSM

For left TRSM we compute i-th element of k-th solution vector accumulating with j (dots): Xi,k = Bi,k – ( ∑j≠i Ai,j Xj,k ) / Ai,i

![trsm_left](https://user-images.githubusercontent.com/29018159/167644081-ceaa1729-61a2-402f-829e-1c59e5b3ad29.png)

For right TRSM we compute k-th element of i-th solution vector accumulating with j (dots): Xi,k = Bi,k – ( ∑j≠k Xi,j Aj,k ) / Ak,k

![trsm_right](https://user-images.githubusercontent.com/29018159/167644106-a7ded8b6-c603-4eba-bf10-0d0abd964813.png)

Solving order

Solving starts at the corner that yields the Xi,i = Bi,i equation, which makes i iterate over elements in reversed order in the lower triangle of matrix A and in normal order in the upper triangle.

![solving_order png](https://user-images.githubusercontent.com/29018159/167647932-6f77fb68-0252-4f14-8d68-68bb811f2d4c.png)
mzuzek commented 2 years ago

@mhoemmen @fnrizzi

In triangular_matrix_matrix_right_solve() spec we have: trsm-right_formula (similar in in-place and implicit diagonal, so it occurs four times)

Should it be s / A[k,k] and "j not equal to k" ? In right TRSM - i indexes the solution vector in X/B and pretty much falls outside A's domain... The model I've described above already takes this correction into account, so you can compare against the specification.

Alternatively, wouldn't it be less tricky if we weren't flipping the semantics of i and k indexes between left- and right TRSM ? We could, for instance, switch to B[k,i] in right TRSM: with k indexing the solution vector and i it's element in both scenarios (and A being indexed with i and j only).

mhoemmen commented 2 years ago

Oh oof, I'm not sure if I fixed the wording issue in the latest update of P1673. I'll file a proposal issue for that.

mhoemmen commented 2 years ago

@MikolajZuzek I should clarify also: "mathematical expression of the algorithm" is a phrase of power in this proposal that constrains the input and output. The actual indices in the expression don't matter so much. What matters is that the expression needs to compile.

mhoemmen commented 2 years ago

@MikolajZuzek Excellent diagrams, btw! Would it be OK for us to use them elsewhere, e.g., in the proposal or slides? We'll be happy to credit you if you like!

mzuzek commented 2 years ago

@mhoemmen Thank you for the review!

Feel free to use the diagrams as you please. They were rendered with spreadsheet with Unicode math chars - here's my source file: TRSM_stdBLAS.ods (also to be used without any restrictions). I'll be happy to help if you ever need more.

mhoemmen commented 2 years ago

This wording issue will be fixed in R9: https://github.com/ORNL/cpp-proposals-pub/pull/258

Thanks for reporting!