trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 566 forks source link

Teuchos::GlobalMPISession destructor finalizes Kokkos, but constructor does not initialize Kokkos #4142

Closed mhoemmen closed 3 years ago

mhoemmen commented 5 years ago

@trilinos/teuchos

Teuchos::GlobalMPISession's constructor does not call Kokkos::initialize, but GlobalMPISession's destructor calls Kokkos::finalize_all. Teuchos should either do both, or do neither.

https://github.com/trilinos/Trilinos/blob/5487fdcebedeb081477ec09dd3ad9efa9192948d/packages/teuchos/core/src/Teuchos_GlobalMPISession.cpp#L192

mhoemmen commented 5 years ago

Christian assures me that it's OK to call Kokkos::finalize_all multiple times, but it's not OK to use anything in Kokkos after calling it the first time.

bartlettroscoe commented 5 years ago

Christian assures me that it's OK to call Kokkos::finalize_all multiple times, but it's not OK to use anything in Kokkos after calling it the first time.

Constructing the local Teuchos::GlobalMPISession object should be about the first command in main() so nothing in main() should run after that object is destroyed.

ibaned commented 5 years ago

Can I promote the use of Kokkos::ScopeGuard here? Could we not just make Kokkos::ScopeGuard a member of Teuchos::GlobalMPISession? It is very safe to mix with other calls to Kokkos::initialize and Kokkos::finalize. However, if we don't think Teuchos::GlobalMPISession should be allowed to initialize Kokkos if no one else does, then I don't think it should be allowed finalize it either.

mhoemmen commented 5 years ago

@ibaned If some other code calls Kokkos::initialize afterwards, would that still be OK?

rppawlo commented 5 years ago

That change would be a big help, but this means that kokkos will startup for all trilinos tests that use the teuchos unit test harness, not just the subset that specifically called initialize. We might want to run some tests on atdm platforms just to make sure there is no performance impact when run with -j8 or -j16.

ibaned commented 5 years ago

@mhoemmen yes

@rppawlo I'm honestly leaning towards removing that finalize call in GlobalMPISession. It doesn't look like it belongs there at all. Programs that are relying on GlobalMPISession to finalize Kokkos for them should be fixed instead.

rppawlo commented 5 years ago

I'm honestly leaning towards removing that finalize call in GlobalMPISession. It doesn't look like it belongs there at all. Programs that are relying on GlobalMPISession to finalize Kokkos for them should be fixed instead.

@ibaned - I agree that it's the right thing. Just worried about the work it would take. Do we have to go through every single test and example in trilinos that uses GlobalMPISession and now add a finalize? If so, that's quite a bit of work to do it correctly. If I run this on Trilinos (including Drekar and Charon):

[rppawlo@gge Trilinos]$ find . -name '*pp' | xargs grep -is "GlobalMPISession" > count.txt
[rppawlo@gge Trilinos]$ wc -l count.txt 
1883 count.txt

given that half of these instances are including the header, that leaves about 900 cases to check. Maybe it isn't so bad if we can eliminate entire packages based on not using kokkos...

ibaned commented 5 years ago

Thats just the price we pay for the initial bad decision to put it in there. I also recommend we use Kokkos::ScopeGuard, it may be that people were using GlobalMPISession to solve the same lifetime issues that it solves.

mhoemmen commented 5 years ago

Note also that Teuchos::GlobalMPISession actually saves the command-line arguments, and exposes an interface to get them out. This means that if Teuchos::TimeMonitor needs to initialize Kokkos for Kokkos Profiling to work, then TimeMonitor can pass all the right command-line arguments to Kokkos::initialize. It can also schedule finalization as an atexit hook (if not already finalized, then finalize). Oh wait, I don't think Kokkos has an is_finalized, does it....

ibaned commented 5 years ago

Isn't it as simple as is_finalized == !is_initialized?

ibaned commented 5 years ago

Still not a fan of GlobalMPISession doing any of this though

bartlettroscoe commented 5 years ago

Still not a fan of GlobalMPISession doing any of this though

@ibaned, what is the harm in having Teuchos::GlobalMPISession initialize and finalize Kokkos by default? What is the downside? The upsides seem pretty clear.

ibaned commented 5 years ago

Just naming really, one wouldn't expect MPI to have anything to do with Kokkos. However if everyone is used to GlobalMPISession being a misnamed Teuchos::Library or Teuchos::ScopeGuard, then it should be fine for it to internally use a Kokkos::ScopeGuard to initialize and finalize Kokkos.

bartlettroscoe commented 5 years ago

@ibaned, what if we changed the name to Teuchos::ProgramInitializerAndFinalizer (and made Teuchos::GlobalMPISession a deprecated typedef to that)? We can write a simple script to make those changes in all Trilinos code and the deprecated warnings would tell users to make the same changes (e.g. using the same refactoring script). Without knowing more of the downsides, that would be my vote I guess.

ibaned commented 5 years ago

Sounds good to me, I do see now that the name for these types of objects may still require some thought, but if that is GlobalMPISessions de-facto purpose then lets go ahead.

bartlettroscoe commented 5 years ago

@ibaned said:

Sounds good to me, I do see now that the name for these types of objects may still require some thought, but if that is GlobalMPISessions de-facto purpose then lets go ahead.

If we do change the name to Teuchos::ProgramInitializerAndFinalizer, we will have to think about how to explain the scope of that object. (I think that s a way to initialize and finalize MPI and Kokkos as a convenience I think that is a good enough scope.) Should we create a new Trilinos GitHub issue to propose Teuchos::ProgramInitializerAndFinalizer (and perhaps find a better name) and then try to define its scope better and the justification? (I think we need to make it fairly clear that this class is just a FACADE for the things that it initializes and finalizes can customer programs can use or not based on preferences and instead they can just call raw MPI and Kokkos functions themselves. This should be completely transparent.)

This is a big deal because as @rppawlo points out, there are thousands of programs (mostly in Trilinos) that this will impact. It would be great for Trilinos to have a standard for all client programs to initialize and finalize things like this if desired. (But the main use case it to make it easy to maintain Trilinos tests and examples of we are being honest.)

rppawlo commented 5 years ago

Would this also be a pathway for ctest to manage process binding masks when multiple executables are launched as part of testing?

mhoemmen commented 5 years ago

@bartlettroscoe Tpetra::ScopeGuard is already a kind of façade for MPI and Kokkos initialization and finalization. I put a lot of logic in there so that it doesn't double-initialize MPI, etc. If you want that behavior in this Teuchos class, you could just copy and paste that code. The one issue is that Kokkos lacks a function analogous to MPI_Finalized.

bartlettroscoe commented 5 years ago

@bartlettroscoe Tpetra::ScopeGuard is already a kind of façade for MPI and Kokkos initialization and finalization.

@mhoemmen, with a name like "ScopeGuard", I would have never guessed that was its intent. I would have assumed that it was just a genertic template class that implements the RAII idiom.

Seems like it would be good to have a class that correctly initializes and finalizes MPI and Kokkos in Teuchos so that every Trilinos package could use it.

ibaned commented 5 years ago
  1. Kokkos::finalized() is just ! Kokkos::initialized()
  2. Instead of calling Kokkos::initialize() and Kokkos::finalize(), Tpetra::ScopeGuard and Teuchos::Whatchamacallit should just contain Kokkos::ScopeGuard as members.
ibaned commented 5 years ago

@bartlettroscoe I agree the name is still up for debate. In my own code I called it Library, but that name was vetoed by Kokkos leadership.

bartlettroscoe commented 5 years ago

@rppawlo asked:

Would this also be a pathway for ctest to manage process binding masks when multiple executables are launched as part of testing?

Only indirectly by calling MPI and Kokkos initialization functions. I don't think this class should do any more than that.

FYI: I think we have a possible approach addressing that problem should not require any changes to Trilinos C++ code at all. I tried to outline this in the new comment https://github.com/trilinos/Trilinos/issues/2422#issuecomment-453251670. Would you be game to help experiment with this approach?

mhoemmen commented 5 years ago

@bartlettroscoe wrote:

@mhoemmen, with a name like "ScopeGuard", I would have never guessed that was its intent. I would have assumed that it was just a genertic template class that implements the RAII idiom.

I had to give it the same name that Kokkos used. I'm just glad Kokkos has such a thing :-)

mhoemmen commented 5 years ago

@ibaned wrote:

  1. Kokkos::finalized() is just ! Kokkos::initialized()

Right, but that means something different. You're not allowed to call Kokkos::initialize twice, just like you're not allowed to call MPI_Init twice.

github-actions[bot] commented 3 years ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] commented 3 years ago

This issue was closed due to inactivity for 395 days.