mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
67 stars 8 forks source link

Allow overlapping regions in MPI_Scatter[v] #840

Open devreal opened 8 months ago

devreal commented 8 months ago

Problem

Section 6.6 contains the following constraints for MPI_Scatter and MPI_Scatterv:

The specification of counts, types, and displacements should not cause any location on the root to be read more than once.

The accompanying rationale says:

Rationale. Though not needed, the last restriction is imposed so as to achieve symmetry with MPI_GATHER, where the corresponding restriction (a multiple-write restriction) is necessary. (End of rationale.)

Someone working on collective components in Open MPI reported that there is at least one widely used applications out there that provides overlapping segments to MPI_Scatterv, and gets away with it.

Proposal

Remove the sentence and the rationale for it. There is no reason for the restriction and the symmetry with MPI_Gather serves no purpose.

Changes to the Text

TBD

Impact on Implementations

Given that applications have been violating this constraint and got away with it I don't expect that any implementation exploits it. So there should be no impact on implementations.

Impact on Users

Freedom to scatter overlapping regions!

References and Pull Requests

TBD

jeffhammond commented 8 months ago

Is there a restriction on the input buffer of alltoallv?

devreal commented 8 months ago

I don't see such restriction on alltoall or alltoallv. There probably should be a restriction on the output buffers of these operations?

bosilca commented 8 months ago

Any such restrictions will be non-enforceable by MPI because detecting overlap (in the general case) is P-complete.

devreal commented 8 months ago

I think the restriction on the output buffers should be along the lines of "the result of overlapping regions in the result buffer is undefined", i.e., MPI can deposit data in any order.

bosilca commented 8 months ago

Here is what we are saying about this in the datatype chapter (5.1.11):

A datatype may specify overlapping entries. The use of such a datatype in any communication in association with a buffer updated by the operation is erroneous. (This is erroneous even if the actual message received is short enough not to write any entry more than once.)

And then in 6.9.1:

Overlapping datatypes are permitted in “send” buffers. Overlapping datatypes in “receive” buffers are erroneous and may give unpredictable results.

The right approach is not to overspecify, but let the same logic apply.