mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
67 stars 8 forks source link

is MPI_COMM_FREE synchronizing #675

Open jeffhammond opened 1 year ago

jeffhammond commented 1 year ago

Problem

MPI_COMM_FREE is "probably local, unless using a debug build, in which case it might be a synchronizing collective" is a terrible semantic.

This collective operation marks the communication object for deallocation. The handle is set to MPI_COMM_NULL. Any pending operations that use this communicator will complete normally; the object is actually deallocated only if there are no other active references to it. This call applies to intra- and inter-communicators. The delete callback functions for all cached attributes (see Section 7.7) are called in arbitrary order.

Advice to implementors. Though collective, it is anticipated that this operation will normally be implemented to be local, though a debugging version of an MPI library might choose to synchronize. (End of advice to implementors.)

Proposal

We should remove the nonsense about a debug build and say that MPI_COMM_FREE always has local semantics and that calls to MPI_COMM_FREE on different communicators need not be ordered the same across all processes.

Changes to the Text

Delete this:

Advice to implementors. Though collective, it is anticipated that this operation will normally be implemented to be local, though a debugging version of an MPI library might choose to synchronize. (End of advice to implementors.)

Impact on Implementations

If implementations are synchronizing in MPI_COMM_FREE, they need to stop, and implement the expected "normal" behavior.

Impact on Users

This is critical in contexts where garbage collectors are freeing MPI objects. This happens in both Python and Julia. This issue is the direct results of complaints by both Python and Julia users who are forced to write very complicated workarounds for the fact that straightforward use of MPI_COMM_FREE is allowed to deadlock, because garbage collectors don't guarantee the same ordering across processes.

References and Pull Requests