Potential issue with RMA and sessions

hjelmn commented 5 years ago

This issue is meant to track an issue that we have to think about with the upcoming sessions proposal.

The sessions proposal wants to add the following function:

MPI_Win_allocate_shared_from_group

Now at first glance this function doesn't look an issue. The problem is this function is 35 characters in length which will violate the (please die already Fortran) F90 standard. How does the RMA working group want to handle this?

hjelmn commented 5 years ago

One thing I want to understand is why MPI_Win_allocate_shared exists at all. Why not overload MPI_Win_allocate to handle the shared-memory case as well. We could allow MPI_Win_shared_query to work on all windows as @jeffhammond wants and would have the exact same functionality. As such, the way I see to move forward with this issue is to deprecate MPI_Win_allocate_shared (but not targeted for removal) so we don't have to add MPI_Win_allocate_shared_from_group. Either that or we can break consistency and just not add MPI_Win_allocate_shared_from_group.

hjelmn commented 5 years ago

@pavanbalaji Please comment.

hjelmn commented 5 years ago

I should also add that the topology working group has a similar issue with MPI_Dist_graph_create_adjacent_from_group, MPI_Dist_graph_create_from_group, etc but they will probably change how topologies are implemented which will eliminate the issue.

pavanbalaji commented 5 years ago

@hjelmn I think that should be fine, but we should carefully think about at least the following things:

When the user actually wants shared memory. For example, if the user doesn't actually care about shared memory, I can have all memory start at a symmetric location. This can be fixed with an info argument, with the default being the user does not need shared memory.
Contiguous or noncontiguous allocation by default. I believe Jeff's current proposal for updating MPI_Win_allocate uses noncontig by default (which is the same thing that MPICH does), whereas MPI_Win_allocate_shared uses contig by default.

hjelmn commented 5 years ago

@pavanbalaji Good things to keep in mind.

In Open MPI the two allocate calls are equivalent if all processes are local. If they are not then we take a different path. We could qualify the defaults for the different cases and avoid requiring users to have to set info keys.

For example if they do:

MPI_Comm_split_type (comm1, ..., MPI_COMM_TYPE_SHARED,..., &comm2);
MPI_Win_allocate (..., comm2, &base, &win);

we could require MPI implementations to make this equivalent to:

MPI_Win_allocate_shared (..., comm2, &base, &win);

Then use the current defaults of MPI_Win_allocate for windows that span nodes. I don't know if that is too complicated.

pavanbalaji commented 5 years ago

@hjelmn The decision to use contig or noncontig as default comes from whether users need to see shared memory directly. If they don't need shared memory to be visible, then noncontig is better to have (for performance reasons). So simply requiring MPI implementations to have MPI_Win_allocate(node_comm) be the same as as MPI_Win_allocate_shared(node_comm) might not be sufficient.

I'd recommend deprecating the alloc_shared_noncontig info key and instead define a new key called shmem_alloc that can take the values contig, noncontig, and system. The default for MPI_Win_allocate can be system, so each implementation can do whatever it likes and the user cannot expect any particular behavior (which is the same as today). The user can also request for a particular behavior by explicitly using contig or noncontig. The user would still need to verify to make sure she got what was requested because they are just hints and the implementation might ignore them anyway. The user can verify whether shared memory was allocated and if it is contiguous or not using the query routines.

Thoughts?

jeffhammond commented 5 years ago

@hjelmn You could use the symbol name MPI_Win_allocate_shared_group instead, should the forum decide that the Fortran 90 symbol limitation matters, but I assume that most of the relevant compilers are already supporting the Fortran 2003 symbol limit of 63.

jeffhammond commented 5 years ago

Frankly, whether or not noncontig is the default or not is a really stupid justification for a separate function.

pavanbalaji commented 5 years ago

Frankly, whether or not noncontig is the default or not is a really stupid justification for a separate function.

I agree. We can easily fix it with some info keys.

hjelmn commented 5 years ago

@jeffhammond The problem is consistency. We have from MPI-3 MPI_Comm_create_group which is why everything has from_group :(. The question is does consistency matter?

jeffhammond commented 5 years ago

MPI_Reduce_scatter_block says consistency doesn't matter 😝

hjelmn commented 5 years ago

@jeffhammond Indeed 🤣

hjelmn commented 5 years ago

@pavanbalaji I like it. By enumerating the possible values it gives us flexibility to add additional values in the future if needed.

devreal commented 2 years ago

Closing as the initial issue of introducing MPI_Win_allocate_shared_from_group was dropped from the Sessions proposal. There is always the route through MPI_Comm_create_from_group to get a window from a group created from a Session.

mpiwg-rma / rma-issues

Potential issue with RMA and sessions #9