kokkos / stdBLAS

Reference Implementation for stdBLAS
Other
118 stars 22 forks source link

Status of inclusion in standard library? #273

Closed msimberg closed 1 month ago

msimberg commented 2 months ago

Now that P1673 has been approved, what is the plan for stdBLAS going forward? Is there a plan to continue improving the implementation here? Is there already activity to try to get this into standard libraries (this implementation or new implementations? there's a lot of stuff for implementers to implement here)? Are you planning to improve the blas offloading? Currently it looks like what is there is simply disabled (e.g. https://github.com/kokkos/stdBLAS/blob/06e90a58b67c5adefff0f06904c8f8bc3371815b/include/experimental/__p1673_bits/blas3_matrix_product.hpp#L56-L57).

Thanks for any pointers that you may have! It'd be interesting to try this out for real, but it's hard to judge just how production ready this is at the moment (or when one might expect it to be that).

mhoemmen commented 2 months ago

@msimberg FYI, NVIDIA has its own implementation in the HPC Toolkit. I can't speak for the other reference implementation authors about their support intentions, but I would certainly welcome updates here. I just don't have time to do much other than fix conformance or correctness issues.

msimberg commented 2 months ago

Thanks @mhoemmen! That makes sense.