Also print the CPU and GPU BLAS version, where known. Tested on various systems:
pangolin blaspp> ./test/tester
BLAS++ version 2023.11.05, id 92ad3b4f, OpenBLAS 0.3.21
pangolin blaspp> ./test/tester
BLAS++ version 2023.11.05, id 92ad3b4f, Apple Accelerate
methane blaspp> ./test/tester
BLAS++ version 2023.11.05, id 92ad3b4f, Intel MKL 2023.0.2, CUDA 11.0.0
dopamine blaspp> ./test/tester
BLAS++ version 2023.11.05, id 92ad3b4f, Intel MKL 2023.0.2, ROCm 5.7.1
rocBLAS 3.0 in ROCm 5.6.0 introduced a 3 matrix trmm, with separate B and C matrices that can be aliased. See https://rocblas.readthedocs.io/en/master/API_Reference_Guide.html#rocblas-xtrmm-batched-strided-batched This updates BLAS++ to call the new interface when available.
Also print the CPU and GPU BLAS version, where known. Tested on various systems: