Open MalachiTimothyPhillips opened 1 week ago
@MalachiTimothyPhillips thanks a lot for reporting this. FYI, we hit this with Kokkos 4.4 integrations in Sierra. I would also add to the comments that there are several functions in Tpetra::Map
that call lazyPushToHost
; not limited to just getLocalElement
. It's easy enough to work around, so this is not super urgent, but would be nice to fix.
@MalachiTimothyPhillips @japlews We generally recommend avoiding calling Tpetra functions (excepting those with KOKKOS_FUNCTION decorators) in parallel regions, because that code will never work on a GPU.
As you note, the documentation about thread safety of that function is no longer correct. The change we neglected to document is an intentional one --- the lazyPushToHost stuff was designed to avoid adding a bunch of D2H calls in the middle of an otherwise 100% GPU calculations. Previously every Map construction involved those D2H calls.
If you're not building a GPU backend, we might be able to clean up the fences. But if you are building a GPU backend with a host-parallel front end, then those fences really do need to be there.
@csiefer2 This sounds good. I only advocate for improving the comments or otherwise enforcing somehow that we don't do this in threaded loops where a hang could occur!
Bug Report
@trilinos/tpetra @japlews @vbrunini
Tpetra::Map<>::getLocalElement
(which according to the doxygen should be thread safe) may callTpetra::Map<>::lazyPushToHost
.Tpetra::Map<>::lazyPushToHost
performs several operations (Kokkos::deep_copy
,Kokkos::fence
) that are "illegal" from within a parallel region.With the recent Kokkos 4.4 promotion, this may now cause host-side parallel regions to hang.
See:
https://github.com/trilinos/Trilinos/blob/b895aa6850943911c4b9388249a00a4d372f601e/packages/tpetra/core/src/Tpetra_Map_decl.hpp#L631
and
https://github.com/trilinos/Trilinos/blob/b895aa6850943911c4b9388249a00a4d372f601e/packages/tpetra/core/src/Tpetra_Map_def.hpp#L1251
Steps to Reproduce
Call
Tpetra::Map<>::lazyPushToHost
from within a host-sideKokkos::parallel_for
.