Closed mhoemmen closed 3 years ago
Christian assures me that it's OK to call Kokkos::finalize_all
multiple times, but it's not OK to use anything in Kokkos after calling it the first time.
Christian assures me that it's OK to call
Kokkos::finalize_all
multiple times, but it's not OK to use anything in Kokkos after calling it the first time.
Constructing the local Teuchos::GlobalMPISession object should be about the first command in main() so nothing in main() should run after that object is destroyed.
Can I promote the use of Kokkos::ScopeGuard
here? Could we not just make Kokkos::ScopeGuard
a member of Teuchos::GlobalMPISession
? It is very safe to mix with other calls to Kokkos::initialize
and Kokkos::finalize
. However, if we don't think Teuchos::GlobalMPISession
should be allowed to initialize Kokkos if no one else does, then I don't think it should be allowed finalize it either.
@ibaned If some other code calls Kokkos::initialize
afterwards, would that still be OK?
That change would be a big help, but this means that kokkos will startup for all trilinos tests that use the teuchos unit test harness, not just the subset that specifically called initialize. We might want to run some tests on atdm platforms just to make sure there is no performance impact when run with -j8 or -j16.
@mhoemmen yes
@rppawlo I'm honestly leaning towards removing that finalize
call in GlobalMPISession
. It doesn't look like it belongs there at all. Programs that are relying on GlobalMPISession
to finalize Kokkos for them should be fixed instead.
I'm honestly leaning towards removing that
finalize
call inGlobalMPISession
. It doesn't look like it belongs there at all. Programs that are relying onGlobalMPISession
to finalize Kokkos for them should be fixed instead.
@ibaned - I agree that it's the right thing. Just worried about the work it would take. Do we have to go through every single test and example in trilinos that uses GlobalMPISession and now add a finalize? If so, that's quite a bit of work to do it correctly. If I run this on Trilinos (including Drekar and Charon):
[rppawlo@gge Trilinos]$ find . -name '*pp' | xargs grep -is "GlobalMPISession" > count.txt
[rppawlo@gge Trilinos]$ wc -l count.txt
1883 count.txt
given that half of these instances are including the header, that leaves about 900 cases to check. Maybe it isn't so bad if we can eliminate entire packages based on not using kokkos...
Thats just the price we pay for the initial bad decision to put it in there. I also recommend we use Kokkos::ScopeGuard
, it may be that people were using GlobalMPISession
to solve the same lifetime issues that it solves.
Note also that Teuchos::GlobalMPISession
actually saves the command-line arguments, and exposes an interface to get them out. This means that if Teuchos::TimeMonitor
needs to initialize Kokkos for Kokkos Profiling to work, then TimeMonitor can pass all the right command-line arguments to Kokkos::initialize
. It can also schedule finalization as an atexit
hook (if not already finalized, then finalize). Oh wait, I don't think Kokkos has an is_finalized
, does it....
Isn't it as simple as is_finalized == !is_initialized
?
Still not a fan of GlobalMPISession
doing any of this though
Still not a fan of GlobalMPISession doing any of this though
@ibaned, what is the harm in having Teuchos::GlobalMPISession initialize and finalize Kokkos by default? What is the downside? The upsides seem pretty clear.
Just naming really, one wouldn't expect MPI to have anything to do with Kokkos. However if everyone is used to GlobalMPISession
being a misnamed Teuchos::Library
or Teuchos::ScopeGuard
, then it should be fine for it to internally use a Kokkos::ScopeGuard
to initialize and finalize Kokkos.
@ibaned, what if we changed the name to Teuchos::ProgramInitializerAndFinalizer
(and made Teuchos::GlobalMPISession
a deprecated typedef to that)? We can write a simple script to make those changes in all Trilinos code and the deprecated warnings would tell users to make the same changes (e.g. using the same refactoring script). Without knowing more of the downsides, that would be my vote I guess.
Sounds good to me, I do see now that the name for these types of objects may still require some thought, but if that is GlobalMPISessions
de-facto purpose then lets go ahead.
@ibaned said:
Sounds good to me, I do see now that the name for these types of objects may still require some thought, but if that is
GlobalMPISessions
de-facto purpose then lets go ahead.
If we do change the name to Teuchos::ProgramInitializerAndFinalizer
, we will have to think about how to explain the scope of that object. (I think that s a way to initialize and finalize MPI and Kokkos as a convenience I think that is a good enough scope.) Should we create a new Trilinos GitHub issue to propose Teuchos::ProgramInitializerAndFinalizer
(and perhaps find a better name) and then try to define its scope better and the justification? (I think we need to make it fairly clear that this class is just a FACADE for the things that it initializes and finalizes can customer programs can use or not based on preferences and instead they can just call raw MPI and Kokkos functions themselves. This should be completely transparent.)
This is a big deal because as @rppawlo points out, there are thousands of programs (mostly in Trilinos) that this will impact. It would be great for Trilinos to have a standard for all client programs to initialize and finalize things like this if desired. (But the main use case it to make it easy to maintain Trilinos tests and examples of we are being honest.)
Would this also be a pathway for ctest to manage process binding masks when multiple executables are launched as part of testing?
@bartlettroscoe Tpetra::ScopeGuard
is already a kind of façade for MPI and Kokkos initialization and finalization. I put a lot of logic in there so that it doesn't double-initialize MPI, etc. If you want that behavior in this Teuchos class, you could just copy and paste that code. The one issue is that Kokkos lacks a function analogous to MPI_Finalized
.
@bartlettroscoe Tpetra::ScopeGuard is already a kind of façade for MPI and Kokkos initialization and finalization.
@mhoemmen, with a name like "ScopeGuard", I would have never guessed that was its intent. I would have assumed that it was just a genertic template class that implements the RAII idiom.
Seems like it would be good to have a class that correctly initializes and finalizes MPI and Kokkos in Teuchos so that every Trilinos package could use it.
Kokkos::finalized()
is just ! Kokkos::initialized()
Kokkos::initialize()
and Kokkos::finalize()
, Tpetra::ScopeGuard
and Teuchos::Whatchamacallit
should just contain Kokkos::ScopeGuard
as members.@bartlettroscoe I agree the name is still up for debate. In my own code I called it Library
, but that name was vetoed by Kokkos leadership.
@rppawlo asked:
Would this also be a pathway for ctest to manage process binding masks when multiple executables are launched as part of testing?
Only indirectly by calling MPI and Kokkos initialization functions. I don't think this class should do any more than that.
FYI: I think we have a possible approach addressing that problem should not require any changes to Trilinos C++ code at all. I tried to outline this in the new comment https://github.com/trilinos/Trilinos/issues/2422#issuecomment-453251670. Would you be game to help experiment with this approach?
@bartlettroscoe wrote:
@mhoemmen, with a name like "ScopeGuard", I would have never guessed that was its intent. I would have assumed that it was just a genertic template class that implements the RAII idiom.
I had to give it the same name that Kokkos used. I'm just glad Kokkos has such a thing :-)
@ibaned wrote:
Kokkos::finalized()
is just! Kokkos::initialized()
Right, but that means something different. You're not allowed to call Kokkos::initialize
twice, just like you're not allowed to call MPI_Init
twice.
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE
label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE
.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.
This issue was closed due to inactivity for 395 days.
@trilinos/teuchos
Teuchos::GlobalMPISession
's constructor does not callKokkos::initialize
, but GlobalMPISession's destructor callsKokkos::finalize_all
. Teuchos should either do both, or do neither.https://github.com/trilinos/Trilinos/blob/5487fdcebedeb081477ec09dd3ad9efa9192948d/packages/teuchos/core/src/Teuchos_GlobalMPISession.cpp#L192