mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
66 stars 7 forks source link

RMA Notification #59

Open jdinan opened 8 years ago

jdinan commented 8 years ago

Problem Statement

In passive target mode, notifying the target that data has been transmitted is currently inefficient. It requires sending additional messages after operations that are to be notified have been remotely completed.

Window Counter Solution 1: Sync-and-Notify

Addition of new "synchronize-and-notify" routines:

int MPI_Win_flush_notify(int rank, MPI_Win win);
int MPI_Win_unlock_notify(int rank, MPI_Win win);
int MPI_Win_flush_all_notify(MPI_Win win);

int MPI_Win_get_notify(MPI_Win win, long count);
int MPI_Win_set_notify(MPI_Win win, long count);
int MPI_Win_wait_notify(MPI_Win win, long geq_value);

A notification counter is associated with the window, and is incremented at the target after the given passive target epoch has completed at the target (i.e. data is visible to the target process). Get, set, and wait functions are provided to enable a process to query the number of notifications it has received.

Criticism: Since the notification is separate from communication operations, e.g. put-and-notify, this can require two separate operations, which will not improve performance.

Window Counter Solution 2: Op-and-Notify

Addition of new "communicate-and-notify" routines:

int MPI_Put_notify(..., MPI_Win win); /* Identical args as MPI_Put */
int MPI_Get_notify(... , MPI_Win win);
int MPI_Accumulate_notify(..., MPI_Win win);

int MPI_Win_get_notify(MPI_Win win, long count);
int MPI_Win_set_notify(MPI_Win win, long count);
int MPI_Win_wait_notify(MPI_Win win, long geq_value);

A notification counter is associated with the window, and is incremented at the target after the given RMA operation has completed at the target (i.e. data is visible to the target process). Get, set, and wait functions are provided to enable a process to query the number of notifications it has received.

Criticism: Only one counter per window.

Matched Notifications

This adds a "tag" to RMA operations and introduces target-side synchronization operations that query for operations matching a particular tag. Communication routines look as follows:

int MPI_Put_notify(void *origin_addr, int origin_count,
        MPI_Datatype origin_type, int target_rank,
        MPI_Aint target_disp, int target_count,
        MPI_Datatype target_type, MPI_Win win, int tag);
int MPI_Get_notify(void *origin_addr, int origin_count,
        MPI_Datatype origin_type, int target_rank,
        MPI_Aint target_disp, int target_count,
        MPI_Datatype target_type, MPI_Win win, int tag);

Synchronization APIs are as follows:

int MPI_Notify_init(MPI_Win win, int src_rank, int tag,
int expected_count, MPI_Request *request);
/*Functions already available in MPI*/
int MPI_Start(MPI_Request *request);
int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status);
int MPI_Wait(MPI_Request *request, MPI_Status *status);

Positives: Most general proposal, enables arbitrary synchronization DAGs. Negatives: Introduces tag matching to RMA and need to deal with unexpected synchronization events. For past discussion, see: 09-2015 -- RMA Notified Access Implementation Discussion.pdf

Memory Synchronization: Put-and-Nofity (OpenSHMEM Style)

Put and notify operations have been supported for a while in Cray SHMEM. Recently they have been proposed for OpenSHMEM 1.5 (https://github.com/openshmem-org/specification/issues/206, https://github.com/openshmem-org/specification/pull/218, https://github.com/openshmem-org/specification/pull/244). The API signature is as follows:

void shmem_put_signal[_nb](void *target, const void *source, size_t len, uint64_t *sig_target, uint64_t sig_val, int pe)

The sig_target location is updated after the update to target is visible. The sig_target location is checked locally using a shmem_wait_until operation or remotely using a shmem_atomic_fetch operation.

Positives: This is the only proposal that supports directly third-party producer-consumer relationships. Negatives: Significantly expands the scope of the memory model and requires test/wait routines to be introduced.

References

jdinan commented 8 years ago

Comments copied from Trac:

jdinan commented 6 years ago

Status update: Roughly the same as it was at the June 2015 meeting -- need a strong driver to introduce this new feature and a performance comparison to show that notified RMA performs better than other approaches (e.g. send/recv, active target, etc.).