Closed msimberg closed 1 month ago
@msimberg FYI, NVIDIA has its own implementation in the HPC Toolkit. I can't speak for the other reference implementation authors about their support intentions, but I would certainly welcome updates here. I just don't have time to do much other than fix conformance or correctness issues.
Thanks @mhoemmen! That makes sense.
Now that P1673 has been approved, what is the plan for stdBLAS going forward? Is there a plan to continue improving the implementation here? Is there already activity to try to get this into standard libraries (this implementation or new implementations? there's a lot of stuff for implementers to implement here)? Are you planning to improve the blas offloading? Currently it looks like what is there is simply disabled (e.g. https://github.com/kokkos/stdBLAS/blob/06e90a58b67c5adefff0f06904c8f8bc3371815b/include/experimental/__p1673_bits/blas3_matrix_product.hpp#L56-L57).
Thanks for any pointers that you may have! It'd be interesting to try this out for real, but it's hard to judge just how production ready this is at the moment (or when one might expect it to be that).