huttered40 / capital

Distributed-memory implementations of novel Cholesky and QR matrix factorizations
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Parameterize cholinv #2

Closed huttered40 closed 4 years ago

huttered40 commented 5 years ago

A few design decisions deserve to be parameterized instead of forced, most notably the use of gemm vs. trmm and the corresponding serializations.

Any policy classes must have default types.

Note that validate classes and all code that references choleskyqr must also follow these rules.

huttered40 commented 5 years ago

I found another place in which we are not exploiting the MatrixType template parameters and could use another parameterization (perhaps in place of the dummy one I already set up): at the end, where we perform the last matrix products, we are not exploiting the triangular structure of MatrixRI, and then we serialize from square to square. What is that about?

huttered40 commented 5 years ago

See #21 for more information

huttered40 commented 4 years ago

Add a template parameter for OverlapGather, with the default being no overlap.

huttered40 commented 4 years ago
huttered40 commented 4 years ago

I removed the OverlapGather policy, as I was seeing no benefit (actually made things worse) and also because it was not correct for num_chunks>1 (I don't want to debug it).

We do not need a GEMM vs. TRMM policy, as I think its safe to assume that TRMM's performance is nearly that of TRMM, and when taken into account the less flops for TRMM, there is no tradeoff benefit in choosing GEMM.