mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
67 stars 7 forks source link

Errata on MPI_INTERCOMM_CREATE - the tag argument lost its description #750

Open RolfRabenseifner opened 10 months ago

RolfRabenseifner commented 10 months ago

Problem

From MPI-1.1 to MPI-2.2 the tag argument had a understandable description:

MPI_INTERCOMM_CREATE(local_comm, local_leader, peer_comm, remote_leader, tag, newintercomm) IN / local_comm / local intra-communicator (handle) IN / local_leader / rank of local group leader in local_comm (integer) IN / peer_comm / “peer” communicator; significant only at the local_leader (handle) IN / remote_leader / rank of remote group leader in peer_comm; significant only at the local_leader (integer) IN / tag / “safe” tag (integer) OUT / newintercomm / new inter-communicator (handle ... This call creates an inter-communicator. It is collective over the union of the local and remote groups. Processes should provide identical local_comm and local_leader arguments within each group. Wildcards are not permitted for remote_leader, local_leader, and tag.

This call uses point-to-point communication with communicator peer_comm, and with tag tag between the leaders. Thus, care must be taken that there be no pending communication on peer_comm that could interfere with this communication.

Advice to users. We recommend using a dedicated peer communicator, such as a duplicate of MPI_COMMWORLD, to avoid trouble with peer communicators. (End of advice to users.)_

For MPI-3.0, the MPI forum decided to remove the restriction

Thus, care must be taken that there be no pending communication on peer_comm that could interfere with this communication.

The goal was that the tags for point-to-point communication and for intercomm-creation should no longer interfer.

Starting with MPI-3.0 the technical implementation of this decision was that the whole last paragraph plus Advice to users was completely removed. Therefore, the whole procedure description is in MPI-3.0 until MPI-4.1 only:

This call uses point-to-point communication with communicator peer_comm, and with tag tag between the leaders. Thus, care must be taken that there be no pending communication on peer_comm that could interfere with this communication.

This means, there is no description about the tag:

And I additionally cannot understand why the tag description

IN / tag / “safe” tag (integer)

does not say

IN / tag / “safe” tag; significant only at the local_leader (integer)

The examples listed at hotexamples were not really useful for me: https://cpp.hotexamples.com/examples/-/-/MPI_Intercomm_create/cpp-mpi_intercomm_create-function-examples.html

Proposal

A) Because the calls are collective over the union of local and remote groups, the tags are not needed for matching between the processes within the local or remote group. Therefore the use of the tag can be restricted the leaders of the groups:

IN / tag / “safe” tag; significant only at the local_leader (integer)

The rule can be:

The tags provided by the local and remote leaders must be identical. In the case of two concurrent invocation (e.g., on several threads) with same peer_comm and same leaders within the peer_comm, different tags must be used in both invocations.

B) I expect that A) was the original intention, because in MPI-1.1 until MPI-2.2, the tag was only usable on the peer_comm. But now, the main use is for concurrent calls and therefore, we should have it as in all the related APIs: MPI_COMM_CREATE_GROUP, MPI_COMM_CREATE_FROM_GROUP, and MPI_INTERCOMM_CREATE_FROM_GROUPS

Therefore, I propose to only add:

All MPI processes of the union of the local and remote groups must provide an identical \mpiarg{tag} value; it differentiates concurrent calls in a multithreaded environment.

Changes to the Text

See pull request PR 878

Impact on Implementations

None, because this describes what was intended by the change in MPI-3.0.

Impact on Users

None.

References and Pull Requests

https://github.com/mpi-forum/mpi-standard/pull/878

GuillaumeMercier commented 10 months ago

This seems more than an errata to me.

RolfRabenseifner commented 10 months ago

@wesbland I expect that this is definitely tooo late for MPI-4.1 because it requires an errata vote. To which repo should I write the PR?

wesbland commented 10 months ago

This can still be an errata for MPI 4.1. It will just get added after the document is published.

So make the PR against the mpi-4.x branch and we can schedule a reading at the next meeting for it.

RolfRabenseifner commented 10 months ago

@GuillaumeMercier I would like to hand it over to you. Is this okay for you? Open questions etc. are at the PR.

GuillaumeMercier commented 10 months ago

@GuillaumeMercier I would like to hand it over to you. Is this okay for you? Open questions etc. are at the PR.

Yes, it's ok.

RolfRabenseifner commented 10 months ago

Thank you - hereby handed over to you @GuillaumeMercier