kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
311 stars 98 forks source link

blas(axpby): execution space instance semantics not honored #2434

Open romintomasetti opened 1 day ago

romintomasetti commented 1 day ago

Using KokkosBlas::axpby, I could easily get a broken code if both following conditions are met:

  1. use rank-0 views for the coefficients
  2. pass an execution space instance on which these coefficients are being modified by some preceding kernel

The issue is clear if looking e.g. at the following line: https://github.com/kokkos/kokkos-kernels/blob/b3a4bdf6973dceed7715c2fdc5f9499af54af2d8/blas/src/KokkosBlas1_axpby.hpp#L109

Adding an exec_space.fence() before fetching the value of the rank-0 view fixes it.

As a side node, I think you get potentially other issues, e.g. here https://github.com/kokkos/kokkos-kernels/blob/b3a4bdf6973dceed7715c2fdc5f9499af54af2d8/blas/src/KokkosBlas1_axpby.hpp#L137 where passing exec_space to Kokkos::deep_copy seems necessary.