Open devreal opened 11 months ago
According to the definition of MPI_TEST_CANCELLED
it set the flag to true if the communication associated with the status is cancelled, otherwise to false. Taking in account that MPI_Cancel cannot be called on nonblocking collective communications, the status associated with such a request shall never allow the flag to be set to true by MPI_TEST_CANCELLED
, which clearly highlight that the flag will always be FALSE. There is no room for undefined here.
So we're inconsistent here with the RMA chapter (or the RMA chapter is inconsistent, I guess). The issue with that interpretation is that it breaks the "we don't have to touch the status if we don't return MPI_ERR_IN_STATUS
" made in #814 since we have to mark the status as "never cancelled" somehow...
The first sentences of the MPI-4.1 Section 9.3 Error Handling clearly say that the MPI library has any freedom to detect or not detect user errors.
This is stated by the other section on error handling: MPI-4.1, Section 2.8 Error Handling, page 27, lines 7-8: "This document does not specify the state of a computation after an erroneous MPI call has occurred."
In my opinion, this sentence says that in the case of a erroneous usage of an MPI routine, the outcome is undefined.
Does this answer the question of the current title "Outcome of MPI_TEST_CANCELLED on nonblocking collective requests statuses" of this issue?
Does this answer the question of the current title "Outcome of MPI_TEST_CANCELLED on nonblocking collective requests statuses" of this issue?
Absolutely not. We never say that a call to MPI_TEST_CANCELLED
on a status of nonblocking collective operations is erroneous.
We never claimed the MPI standard is consistent across the different chapters !
The MPI_Status contains user-facing and implementation specific fields. In no case the MPI standardization document should define what the MPI implementation is or is not allowed to do with its internal fields. I read the above statement as related to MPI_ERROR, MPI_TAG and MPI_SOURCE, but not to the rest of the MPI_Status structure.
I think, @devreal 's argument is not to prescribe what implementations should do with their internal field. The point is, that for the following call sequence (and I think we agreed that this is valid code and must return false), the implementation must touch the status object or rely on UB:
MPI_Ibcast(..., &req);
MPI_Wait(&req, &status);
MPI_Test_cancelled(&status, &flag);
Using the RMA wording for non-blocking collectives would make this code pattern erroneous and would avoid that the implementation must touch the status object.
I don't think that is prescribed, with the exception maybe of the RMA chapter where the wording is more stringent. The rest of the standard specifically names the fields remaining undefined.
... the values of the MPI_SOURCE and MPI_TAG fields in the associated status object, if any, are undefined.
I don't think that is prescribed, with the exception maybe of the RMA chapter where the wording is more stringent. The rest of the standard specifically names the fields remaining undefined.
The RMA chapter is the least stringent. It basically says that nothing is defined on a status for an RMA request. The collectives chapter says that the public fields are undefined but is silent on the hidden fields (whatever the implementation needs for MPI_TEST_CANCELLED
for example). So the reasonable interpretation is to assume that MPI_TEST_CANCELLED
is well-defined on a status for a collective request.
Either way, it would be nice to specify the the behavior of MPI_TEST_CANCELLED
on collective request statuses.
Either way, it would be nice to specify the the behavior of
MPI_TEST_CANCELLED
on collective request statuses.
do we have non-fatal errors now, where we can return MPI_USER_ERROR
and never crash the program?
Note from the discussion at the MPI Forum meeting March 20: I will create a PR to add language that says that the result of MPI_TEST_CANCELLED
and MPI_GET_ELEMENTS
is undefined on nonblocking collective requests.
Problem
Section 6.12 says:
Does that mean that that
MPI_TEST_CANCELLED
always returns false or that the return value is undefined?In contrast, the RMA chapter explicitly says in 12.3.5:
Proposal
Clarify the whether a call to
MPI_TEST_CANCELLED
on a nonblocking collective request is well-defined and what it is supposed to return.Changes to the Text
TBD
Impact on Implementations
TBD
Impact on Users
Minor (clarity on what to expect from status objects)
References and Pull Requests
814 sparked this inquiry