mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
67 stars 8 forks source link

Outcome of MPI_TEST_CANCELLED on nonblocking collective requests statuses? #821

Open devreal opened 11 months ago

devreal commented 11 months ago

Problem

Section 6.12 says:

Upon returning from a completion call in which a nonblocking collective operation completes, the values of the MPI_SOURCE and MPI_TAG fields in the associated status object, if any, are undefined. The value of MPI_ERROR may be defined, if appropriate, according to the specification in Section 3.2.5. It is valid to mix different request types (i.e., any combination of collective requests, I/O requests, generalized requests, or point-to-point requests) in functions that enable multiple completions (e.g., MPI_WAITALL). It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request associated with a nonblocking collective operation.

Does that mean that that MPI_TEST_CANCELLED always returns false or that the return value is undefined?

In contrast, the RMA chapter explicitly says in 12.3.5:

Upon returning from a completion call in which an RMA operation completes, all fields of the status object, if any, and the results of status query functions (e.g., MPI_GET_COUNT) are undefined with the exception of MPI_ERROR if appropriate (see Section 3.2.5).

Proposal

Clarify the whether a call to MPI_TEST_CANCELLED on a nonblocking collective request is well-defined and what it is supposed to return.

Changes to the Text

TBD

Impact on Implementations

TBD

Impact on Users

Minor (clarity on what to expect from status objects)

References and Pull Requests

814 sparked this inquiry

bosilca commented 11 months ago

According to the definition of MPI_TEST_CANCELLED it set the flag to true if the communication associated with the status is cancelled, otherwise to false. Taking in account that MPI_Cancel cannot be called on nonblocking collective communications, the status associated with such a request shall never allow the flag to be set to true by MPI_TEST_CANCELLED, which clearly highlight that the flag will always be FALSE. There is no room for undefined here.

devreal commented 11 months ago

So we're inconsistent here with the RMA chapter (or the RMA chapter is inconsistent, I guess). The issue with that interpretation is that it breaks the "we don't have to touch the status if we don't return MPI_ERR_IN_STATUS" made in #814 since we have to mark the status as "never cancelled" somehow...

RolfRabenseifner commented 11 months ago

The first sentences of the MPI-4.1 Section 9.3 Error Handling clearly say that the MPI library has any freedom to detect or not detect user errors.

This is stated by the other section on error handling: MPI-4.1, Section 2.8 Error Handling, page 27, lines 7-8: "This document does not specify the state of a computation after an erroneous MPI call has occurred."

In my opinion, this sentence says that in the case of a erroneous usage of an MPI routine, the outcome is undefined.

Does this answer the question of the current title "Outcome of MPI_TEST_CANCELLED on nonblocking collective requests statuses" of this issue?

devreal commented 11 months ago

Does this answer the question of the current title "Outcome of MPI_TEST_CANCELLED on nonblocking collective requests statuses" of this issue?

Absolutely not. We never say that a call to MPI_TEST_CANCELLED on a status of nonblocking collective operations is erroneous.

bosilca commented 11 months ago

We never claimed the MPI standard is consistent across the different chapters !

The MPI_Status contains user-facing and implementation specific fields. In no case the MPI standardization document should define what the MPI implementation is or is not allowed to do with its internal fields. I read the above statement as related to MPI_ERROR, MPI_TAG and MPI_SOURCE, but not to the rest of the MPI_Status structure.

jprotze commented 11 months ago

I think, @devreal 's argument is not to prescribe what implementations should do with their internal field. The point is, that for the following call sequence (and I think we agreed that this is valid code and must return false), the implementation must touch the status object or rely on UB:

MPI_Ibcast(..., &req);
MPI_Wait(&req, &status);
MPI_Test_cancelled(&status, &flag);

Using the RMA wording for non-blocking collectives would make this code pattern erroneous and would avoid that the implementation must touch the status object.

bosilca commented 11 months ago

I don't think that is prescribed, with the exception maybe of the RMA chapter where the wording is more stringent. The rest of the standard specifically names the fields remaining undefined.

... the values of the MPI_SOURCE and MPI_TAG fields in the associated status object, if any, are undefined.

devreal commented 11 months ago

I don't think that is prescribed, with the exception maybe of the RMA chapter where the wording is more stringent. The rest of the standard specifically names the fields remaining undefined.

The RMA chapter is the least stringent. It basically says that nothing is defined on a status for an RMA request. The collectives chapter says that the public fields are undefined but is silent on the hidden fields (whatever the implementation needs for MPI_TEST_CANCELLED for example). So the reasonable interpretation is to assume that MPI_TEST_CANCELLED is well-defined on a status for a collective request.

Either way, it would be nice to specify the the behavior of MPI_TEST_CANCELLED on collective request statuses.

jeffhammond commented 11 months ago

Either way, it would be nice to specify the the behavior of MPI_TEST_CANCELLED on collective request statuses.

do we have non-fatal errors now, where we can return MPI_USER_ERROR and never crash the program?

devreal commented 8 months ago

Note from the discussion at the MPI Forum meeting March 20: I will create a PR to add language that says that the result of MPI_TEST_CANCELLED and MPI_GET_ELEMENTS is undefined on nonblocking collective requests.