kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
292 stars 93 forks source link

Full-BLAS support #420

Open crtrott opened 5 years ago

crtrott commented 5 years ago

This is an issue to track full blas support. We will update the issue when new stuff gets added. I list the single precision blas call here, Kokkos Kernels is scalar type agnostic.

BLAS 1

BLAS Call Kokkos Kernels Call Reference TPL BLAS TPL CUBLAS TPL ROCBLAS TPL oneMKL Complex Special
SROTG rotg(a, b, c, s) done done done done -- --
SROTMG rotmg(d1, d2, x1, y1, param) done done done done -- NC
SROT rot(X, Y, c, s) done done done done -- --
SROTM rotm(X, Y, param) done done done -- -- NC
SSWAP swap(X, Y) done done done done -- N/A
SSCAL scal(y,a,x) done -- -- -- -- N/A
CSSCAL scal(y,a,x) done -- -- -- -- OC
SCOPY deep_copy(y,x) done -- -- -- -- N/A
SAXPY axpby(a,x,b,y) done -- -- -- -- --
SDOT* dot(x,y) done -- -- -- -- --
SDSDOT* dot(x,y) done -- -- -- -- NC
CDOTU -- -- -- -- -- -- OC
CDOTC* dot(x,y) done -- -- -- -- OC
SNRM2 nrm2(x) done -- -- -- -- NC
SCNRM2 nrm2(x) done -- -- -- -- OC
SASUM asum(x) done done done done done --
ISAMAX iamax(x) done -- -- -- -- --

*Kokkos Kernels dot() has a slightly different behavior if the result is passed as a return value or as an output Kokkos::View. In the former, dot product is always accumulated in double, in the later the dot product is accumulated in a scalar of same type as value_type of the output view.

BLAS 2

Not instead of the symmetric calls, for complex it has hermetian.

BLAS Call std::blas Kokkos Kernels Call Reference TPL BLAS TPL CUBLAS TPL ROCBLAS TPL oneMKL Complex Special
SGEMV y gemv(trans,a,A,x,b,y) done -- -- -- -- --
SGBMV n -- -- -- -- -- -- --
SSYMV y -- -- -- -- -- -- --
SSBMV n -- -- -- -- -- -- --
SSPMV y -- -- -- -- -- -- --
STRMV y derive from trmm done -- -- -- -- --
STBMV n -- -- -- -- -- -- --
STPMV y -- -- -- -- -- -- --
STRSV y derive from trmv done -- -- -- -- --
STBSV n -- -- -- -- -- -- --
STPSV y -- -- -- -- -- -- --
SGER y ger(trans,a,x,y,A) done X X X -- NC
CGERU y ger(trans,a,x,y,A) done X X X -- OC
CGERC y ger(trans,a,x,y,A) done X X X -- OC
SSYR y syr(trans,uplo,a,x,A) done X X X -- --
SSPR y -- -- -- -- -- -- --
SSYR2 y syr2(trans,uplo,a,x,y,A) done X X X -- --
SSPR2 y -- -- -- -- -- -- --

BLAS 3

BLAS Call std::blas Kokkos Kernels Call Reference TPL BLAS TPL CUBLAS Complex Special
SGEMM y gemm(transA,transB,a,A,B,b,C) -- -- -- --
SSYMM y -- -- -- -- --
SSYRK y -- -- -- -- --
SSYR2K y -- -- -- -- --
CHEMM y -- -- -- -- OC
CHERK y -- -- -- -- OC
CHER2K y -- -- -- -- OC
STRMM y trmm(side, uplo, trans, diag, a, A, B) -- -- -- --
STRSM y trsm(side, uplo, trans, diag, a, A, B) -- -- -- --
lshulen commented 5 years ago

For the qmcpack miniapp, it would be very useful to have gemv and ger. Typical problem sizes would be for matrix / vectors that have dimensionality of ~300 - 5000.