Closed hjelmn closed 2 years ago
One thing I want to understand is why MPI_Win_allocate_shared
exists at all. Why not overload MPI_Win_allocate
to handle the shared-memory case as well. We could allow MPI_Win_shared_query
to work on all windows as @jeffhammond wants and would have the exact same functionality. As such, the way I see to move forward with this issue is to deprecate MPI_Win_allocate_shared
(but not targeted for removal) so we don't have to add MPI_Win_allocate_shared_from_group
. Either that or we can break consistency and just not add MPI_Win_allocate_shared_from_group
.
@pavanbalaji Please comment.
I should also add that the topology working group has a similar issue with MPI_Dist_graph_create_adjacent_from_group
, MPI_Dist_graph_create_from_group
, etc but they will probably change how topologies are implemented which will eliminate the issue.
@hjelmn I think that should be fine, but we should carefully think about at least the following things:
When the user actually wants shared memory. For example, if the user doesn't actually care about shared memory, I can have all memory start at a symmetric location. This can be fixed with an info argument, with the default being the user does not need shared memory.
Contiguous or noncontiguous allocation by default. I believe Jeff's current proposal for updating MPI_Win_allocate
uses noncontig by default (which is the same thing that MPICH does), whereas MPI_Win_allocate_shared
uses contig by default.
@pavanbalaji Good things to keep in mind.
In Open MPI the two allocate calls are equivalent if all processes are local. If they are not then we take a different path. We could qualify the defaults for the different cases and avoid requiring users to have to set info keys.
For example if they do:
MPI_Comm_split_type (comm1, ..., MPI_COMM_TYPE_SHARED,..., &comm2);
MPI_Win_allocate (..., comm2, &base, &win);
we could require MPI implementations to make this equivalent to:
MPI_Win_allocate_shared (..., comm2, &base, &win);
Then use the current defaults of MPI_Win_allocate
for windows that span nodes. I don't know if that is too complicated.
@hjelmn The decision to use contig or noncontig as default comes from whether users need to see shared memory directly. If they don't need shared memory to be visible, then noncontig
is better to have (for performance reasons). So simply requiring MPI implementations to have MPI_Win_allocate(node_comm)
be the same as as MPI_Win_allocate_shared(node_comm)
might not be sufficient.
I'd recommend deprecating the alloc_shared_noncontig
info key and instead define a new key called shmem_alloc
that can take the values contig
, noncontig
, and system
. The default for MPI_Win_allocate
can be system
, so each implementation can do whatever it likes and the user cannot expect any particular behavior (which is the same as today). The user can also request for a particular behavior by explicitly using contig
or noncontig
. The user would still need to verify to make sure she got what was requested because they are just hints and the implementation might ignore them anyway. The user can verify whether shared memory was allocated and if it is contiguous or not using the query routines.
Thoughts?
@hjelmn You could use the symbol name MPI_Win_allocate_shared_group
instead, should the forum decide that the Fortran 90 symbol limitation matters, but I assume that most of the relevant compilers are already supporting the Fortran 2003 symbol limit of 63.
Frankly, whether or not noncontig is the default or not is a really stupid justification for a separate function.
Frankly, whether or not noncontig is the default or not is a really stupid justification for a separate function.
I agree. We can easily fix it with some info keys.
@jeffhammond The problem is consistency. We have from MPI-3 MPI_Comm_create_group
which is why everything has from_group
:(. The question is does consistency matter?
MPI_Reduce_scatter_block
says consistency doesn't matter 😝
@jeffhammond Indeed 🤣
@pavanbalaji I like it. By enumerating the possible values it gives us flexibility to add additional values in the future if needed.
Closing as the initial issue of introducing MPI_Win_allocate_shared_from_group
was dropped from the Sessions proposal. There is always the route through MPI_Comm_create_from_group
to get a window from a group created from a Session.
This issue is meant to track an issue that we have to think about with the upcoming sessions proposal.
The sessions proposal wants to add the following function:
Now at first glance this function doesn't look an issue. The problem is this function is 35 characters in length which will violate the (please die already Fortran) F90 standard. How does the RMA working group want to handle this?