Open dssgabriel opened 1 month ago
Thank you very much for the first review, @devreal!
Regarding C++ destruction and the SPM finalization semantics: is MPI_Comm_free
required to be called before MPI_Session_finalize
? And if so, how can I destroy the communicator without having to call its dtor explicitly?
Just a PSA: OMPI in combination with UCX won't support Sessions until the next major release of OMPI (6.0): https://github.com/open-mpi/ompi/issues/12566#issuecomment-2127642471
It looks like Linux's OpenMPI is too old to support MPI_Session
s (which the initial support for was added in OpenMP 5.0), hence the failing CI tests.
@cwpearson Is there any way we can specifically install OpenMPI 5.x in the CI? The package from Ubuntu repos is simply not up-to-date.
Regarding C++ destruction and the SPM finalization semantics: is MPI_Comm_free required to be called before MPI_Session_finalize? And if so, how can I destroy the communicator without having to call its dtor explicitly?
The standard says that the application is required to clean up its objects before MPI_Session_finalize
:
The call to
MPI_SESSION_FINALIZE
does not free objects created by MPI calls; these objects are freed usingMPI_XXX_FREE
,MPI_COMM_DISCONNECT
, orMPI_FILE_CLOSE
calls. OnceMPI_SESSION_FINALIZE
returns, no MPI procedure may be called in the Sessions Model that are related to this session (not even freeing objects that are derived from this session), except for those listed in Section 11.4.1.
We had a similar issue in other projects and came down to a registry in which objects that needed destruction were referenced. There are two options they can get destroyed:
1) Their destructor is called before the Session is destroyed. Then the object just removes itself from the registry.
2) They are released as part of the destruction of the session object (before MPI_Session_finalize
is called) and the object becomes an empty shell, whose destructor does not do anything.
All of this must be thread-safe etc but communicators are heavy objects that are not created and destroyed regularly so I think the overhead involved in such a scheme is acceptable.
This PR is a first attempt at initialization and finalization for
kokkos-comm
relying onMPI_Session
s.Some noteworthy additions brought by this PR:
Communicator
class that wraps theMPI_Comm
and a Kokkos execution spaceUniverse
class that holds the handle to the MPI sessions, as well as a session-associated communicator.I expect lots of changes in the API, which I find kind of clunky at the moment. Reviews and comments on how to improve are welcome.