mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
67 stars 8 forks source link

RMA accumulate operations don't distinguish between throughput and latency sensitive applications #640

Closed devreal closed 1 year ago

devreal commented 2 years ago

Problem

The standard requires that updates from single-element RMA accumulate functions (MPI_Fetch_and_op) and bulk-accumulate functions (MPI_Accumulate) are atomic with respect to each other. Since the number of elements passed to MPI_Accumulate is not known a priori, implementations typically fall back to a scheme that provides high throughput for high numbers of elements at the cost of latency of small (single) element accumulate operations and (in some cases) progress dependency at the target. This makes RMA accumulate operations less than ideal for application wanting to use low-latency network atomic operations.

Proposal

Add an info key that allows the application to specify a preference for either latency or throughput of accumulate operations.

Changes to the Text

Add a new info key in the RMA chapter.

Impact on Implementations

In general: none, since it's only an info key. If they want to play nice, they have to add support for that info key and provide two pathways for implementing RMA accumulate operations.

Impact on Users

Making use of atomic memory operations in the NIC is useful for some applications. Users won't have to rely on the implementations to make the right choice for them, because they don't.

References and Pull Requests

https://github.com/mpi-forum/mpi-standard/pull/749

wesbland commented 1 year ago

This had a reading on 2022-12-08, but may need a re-reading at the next meeting.

wesbland commented 1 year ago

Had no-no reading on 2023-05-02.

wesbland commented 1 year ago

This passed a no-no vote.

Yes No Abstain
28 0 4
wesbland commented 1 year ago

This passed a 1st vote.

Yes No Abstain
26 0 6
wesbland commented 1 year ago

This passed a 2nd vote.

Yes No Abstain
33 0 1