mpi-forum / mpi-forum-historic

Migration of old MPI Forum Trac Tickets to GitHub. New issues belong on mpi-forum/mpi-issues.
http://www.mpi-forum.org
2 stars 3 forks source link

permit shared memory window allocation by win_{create,allocate,allocate_dynamic} #397

Open mpiforumbot opened 8 years ago

mpiforumbot commented 8 years ago

Originally by jhammond on 2013-10-22 13:15:38 -0500


(previously titled "extend the use of MPI_WIN_SHARED_QUERY to all windows")

While MPI_WIN_ALLOCATE_SHARED is sufficient to allocate shared-memory windows, on systems it is not necessary, i.e. windows allocated by other means may still support MPI shared-memory accesses.

For example, multiple implementations (MPICH and MVAPICH2, at least) already use shared-memory backing for windows resulting from MPI_WIN_ALLOCATE. Furthermore, some systems (Blue Gene/Q and systems that support XPMEM, at least) allow arbitrary a posteriori interprocess memory exposure and thus can allow MPI shared-memory accesses on windows resulting from MPI_WIN_CREATE.

This ticket proposes two changes:

-To be absolutely clear, this ticket is not requiring implementations do more with shared memory than required by MPI 3.0. Furthermore, implementations that fail to do more are still considered high-quality. The purpose of this ticket is strictly to allow users to query for shared-memory in all windows, which is not permitted by MPI 3.0. There is no good reason for this restriction.*

In the event where the implementation cannot support MPI shared-memory access beyond what MPI-3 defines, the implementation is trivial, because MPI_WIN_SHARED_QUERY will tell the user only the trivial case of local shared-memory access is permitted on windows not allocated by MPI_WIN_ALLOCATE_SHARED.

In order to make it possible for the user to access the shared memory associated with windows allocated by any means, it must be valid to use MPI_WIN_SHARED_QUERY on windows resulting from MPI_WIN_ALLOCATE and MPI_WIN_CREATE, not just MPI_WIN_ALLOCATE_SHARED. We exclude MPI_WIN_CREATE_DYNAMIC because it is not possible for MPI_WIN_SHARED_QUERY to provide useful information in the general case where MPI_WIN_ATTACH has been used more than once on the window. A new query function would be required to support this window type.

The proposed changes to the text are detailed below.

MPI 3.0 text (for reference):

"This function queries the process-local address for remote memory segments created with MPI_WIN_ALLOCATE_SHARED. This function can return different process-local addresses for the same physical memory on different processes. The returned memory can be used for load/store accesses subject to the constraints defined in Section 11.7. This function can only be called with windows of type MPI_WIN_FLAVOR_SHARED. If the passed window is not of flavor MPI_WIN_FLAVOR_SHARED, the error MPI_ERR_RMA_FLAVOR is raised. When rank is MPI_PROC_NULL, the pointer, disp_unit, and size returned are the pointer, disp_unit, and size of the memory segment belonging the lowest rank that specified size > 0. If all processes in the group attached to the window specified size # 0, then the call returns size0 and a baseptr as if MPI_ALLOC_MEM was called with size = 0."

New text:

"This function queries the process-local address for remote memory segments created with MPI_WIN_ALLOCATE_SHARED, MPI_WIN_ALLOCATE and MPI_WIN_CREATE. This function can return different process-local addresses for the same physical memory on different processes. The returned memory can be used for load/store accesses subject to the constraints defined in Section 11.7. When the remote memory segment corresponding to a particular rank cannot be accessed directly, this call returns size # 0 and a baseptr as if MPI_ALLOC_MEM was called with size0. The user can determine the set of ranks for which size might be non-zero using MPI_COMM_SPLIT_TYPE with split_type # MPI_COMM_TYPE_SHARED; however, just because a rank is a member of this communicator does not mean that direct access will be possible. When rank is MPI_PROC_NULL, the pointer, disp_unit, and size returned are the pointer, disp_unit, and size of the memory segment belonging the lowest rank that specified size > 0. If all processes in the group attached to the window specified size0, then the call returns size # 0 and a baseptr as if MPI_ALLOC_MEM was called with size0. For all cases where size = 0, it is erroneous to attempt to directly access the memory associated with a window."

"Advice to users: The usage of MPI_WIN_SHARED_QUERY was extended in MPI-Next to permit its use on windows created by MPI_WIN_ALLOCATE and MPI_WIN_CREATE. In MPI-3, its use was restricted to windows created by MPI_WIN_ALLOCATE_SHARED."

Related Work* "Eliminating Costs for Crossing Process Boundary from MPI Intra-node Communication," Akio Shimada, Atsushi Hori, Yutaka Ishikawa, the 21st European MPI Users' Group Meeting, 2014.

mpiforumbot commented 8 years ago

Originally by jhammond on 2013-12-10 13:36:18 -0600


Brian does not want users to be allowed to request contiguous addresses. Jeff agrees with this.

mpiforumbot commented 8 years ago

Originally by balaji on 2014-03-04 12:05:29 -0600


Straw votes:

Yes: 4 No: 0 Abstain: 2

mpiforumbot commented 8 years ago

Originally by jhammond on 2014-03-04 12:13:06 -0600


The text was modified to exclude windows created by MPI_WIN_CREATE_DYNAMIC, since its use makes no sense there. However, it might be possible to support interprocess address translation with a new function call. This ticket makes no effort to address that situation at the present time.

The text was modified to include an advice to users about the change in the semantics of this function from MPI-3 to MPI-Next.

mpiforumbot commented 8 years ago

Originally by gropp on 2014-09-25 12:39:50 -0500


Particularly with the changes being considered with respect to #456, providing shared memory may negatively impact performance in cases where the user doesn't want to use shared memory (it may constrain the synchronization semantics and require costly memory barriers). For users that don't want the complexity of shared memory programming, they should be able to indicate that they don't want shared memory at window creation time. My preference would be for an info to request that shared memory be provided (keeping the default as it is now), and use the query function to find out what you got.

mpiforumbot commented 8 years ago

Originally by jhammond on 2014-09-25 13:40:26 -0500


Yes, absolutely, this is my thinking too. MPICH and MVAPICH have info keys for this today and I merely want to make it possible, but not required, to support this in the standard.

mpiforumbot commented 8 years ago

Originally by jhammond on 2014-12-08 10:31:08 -0600


Replying to gropp:

Particularly with the changes being considered with respect to #456, providing shared memory may negatively impact performance in cases where the user doesn't want to use shared memory (it may constrain the synchronization semantics and require costly memory barriers). For users that don't want the complexity of shared memory programming, they should be able to indicate that they don't want shared memory at window creation time. My preference would be for an info to request that shared memory be provided (keeping the default as it is now), and use the query function to find out what you got.

Because info keys can be ignored, the default should be to provide shared memory backing for windows whenever possible and to allow users to discourage this via an info key when appropriate.

mpiforumbot commented 8 years ago

Originally by jhammond on 2014-12-08 10:32:54 -0600


Replying to jhammond:

Brian does not want users to be allowed to request contiguous addresses. Jeff agrees with this.

I changed my mind about this. There's no reason why this can't be supported by WIN_ALLOCATE just like WIN_ALLOCATE_SHARED. For WIN_CREATE(_DYNAMIC), it doesn't make any sense and the implementation will ignore it.

mpiforumbot commented 8 years ago

Originally by jhammond on 2014-12-08 10:36:26 -0600


Replying to jhammond:

The text was modified to exclude windows created by MPI_WIN_CREATE_DYNAMIC, since its use makes no sense there. However, it might be possible to support interprocess address translation with a new function call. This ticket makes no effort to address that situation at the present time.

I'm not sure that this makes no sense, but it's hard to reason about this if a window has multiple segments attached to it. In any case, it almost certainly requires a new function call, so this will be addressed on a separate ticket.

mpiforumbot commented 8 years ago

Originally by jhammond on 2014-12-09 11:37:34 -0600


Dec. 2014 WG discussion:

mpiforumbot commented 8 years ago

Originally by gropp on 2014-12-10 12:57:02 -0600


One addition question to address - the interaction of this with #456 could add overhead from ensuring shared memory synchronization/consistency even when it is not desired or required. This suggests requiring the user to indicate somehow (e.g., with a hint on ALLOC_MEM or WIN_ALLOCATE) that shared memory is desired.

mpiforumbot commented 8 years ago

Originally by rsthakur on 2015-06-03 13:19:51 -0500


From the June 2015 Forum meeting: Add rationale about why Win_create_dynamic is excluded. Show with implementation that this can be done with Win_create.

mpiforumbot commented 8 years ago

Originally by jhammond on 2015-06-03 13:33:52 -0500


Rationale for exclusion of WIN_CREATE_DYNAMIC has been added.