icl-utk-edu / blaspp

BLAS++ is a C++ wrapper around CPU and GPU BLAS (basic linear algebra subroutines), developed as part of the SLATE project.
https://icl.utk.edu/slate/
BSD 3-Clause "New" or "Revised" License
66 stars 23 forks source link

rocm: support rocBLAS 3.0 trmm with 3 matrices A, B, C #78

Closed mgates3 closed 8 months ago

mgates3 commented 8 months ago

rocBLAS 3.0 in ROCm 5.6.0 introduced a 3 matrix trmm, with separate B and C matrices that can be aliased. See https://rocblas.readthedocs.io/en/master/API_Reference_Guide.html#rocblas-xtrmm-batched-strided-batched This updates BLAS++ to call the new interface when available.

Also print the CPU and GPU BLAS version, where known. Tested on various systems:

pangolin blaspp> ./test/tester
BLAS++ version 2023.11.05, id 92ad3b4f,  OpenBLAS 0.3.21 

pangolin blaspp> ./test/tester
BLAS++ version 2023.11.05, id 92ad3b4f, Apple Accelerate

methane blaspp> ./test/tester 
BLAS++ version 2023.11.05, id 92ad3b4f, Intel MKL 2023.0.2, CUDA 11.0.0

dopamine blaspp> ./test/tester
BLAS++ version 2023.11.05, id 92ad3b4f, Intel MKL 2023.0.2, ROCm 5.7.1
mgates3 commented 8 months ago

Tested on Frontier. ROCm ≤ 5.5.1 uses the original 2 matrix trmm. ROCm ≥ 5.6.0 uses the new 3 matrix trmm.