Open mpiforumbot opened 8 years ago
Originally by davesolt on 2012-02-27 12:50:49 -0600
Reviewed by Dave Solt
Originally by rhc on 2012-02-27 13:04:44 -0600
Only one comment, and it is something Josh and I have discussed before. These changes to the scope of abort procedures create a high-risk of race conditions. Indeed, the initial implementation of the revised MPI_ABORT did just that in OMPI and had to be removed, though I realize it continues under development in Josh's branch.
So the community should be made aware that, while the change in scope may be of interest, it does create a high risk of destabilizing MPI implementations in general. Implementation should therefore be done on a "electable" basis - i.e., the behavior to restrict scope should not be the default behavior, but one that the user can select. This allows those who don't need the revised scope to have the greater stability while allowing those who want/need the change to experiment with it and accepting the higher risk.
Originally by jjhursey on 2012-02-27 14:11:29 -0600
Replying to rhc:
Only one comment, and it is something Josh and I have discussed before. These changes to the scope of abort procedures create a high-risk of race conditions. Indeed, the initial implementation of the revised MPI_ABORT did just that in OMPI and had to be removed, though I realize it continues under development in Josh's branch.
So the community should be made aware that, while the change in scope may be of interest, it does create a high risk of destabilizing MPI implementations in general. Implementation should therefore be done on a "electable" basis - i.e., the behavior to restrict scope should not be the default behavior, but one that the user can select. This allows those who don't need the revised scope to have the greater stability while allowing those who want/need the change to experiment with it and accepting the higher risk.
Can you elaborate on the exact situations in which you think these semantics might be problematic?
All that this ticket is doing is connecting MPI_ERRORS_ARE_FATAL with MPI_ABORT on the same communicator. If the MPI implementation is not able to abort just that sub-communicator when an error triggers the MPI_ERRORS_ARE_FATAL error handler, then it is fine - as specified for MPI_ABORT. However, if a high quality implementation is able to just abort the subset of processes then it should be allowed to do so. This ticket allows an implementation the explicit opportunity to do so (but does not require it).
One way to think about MPI_ERRORS_ARE_FATAL is as an MPI internal callback that just calls MPI_ABORT on the associated communicator. This is pretty much what we do in Open MPI at the OMPI layer. The runtime layer in Open MPI interprets (by default) the failure/abort of one process as fatal for all processes, which might need to be reconsidered for this ticket. It is still correct for Open MPI to abort all processes, but if we are to be high quality we should at least attempt to terminate just those processes in the subgroup. This does not mean that the runtime needs to be aware of grouping, but that it should be able to self stabilize and notify processes of failures/aborts in the system.
Originally by jjhursey on 2012-02-29 08:44:53 -0600
Attachment added: mpi-ticket-324.pdf
(2344.8 KiB)
Add a version of the MPI standard PDF with these changes for the reading.
Originally by jjhursey on 2012-03-30 13:09:41 -0500
During the March 2012 MPI Forum meeting, the forum body decided (with only 2 participants objecting) that this should be a ticket 0 change. Some wording changes were suggested. The ticket will be updated shortly.
Originally by jjhursey on 2012-03-30 13:40:46 -0500
Attachment added: mpi-ticket-324-v2.pdf
(2344.8 KiB)
New version with text modified per the comments from the March 2012 MPI Forum meeting
Originally by jjhursey on 2012-06-20 09:18:02 -0500
Reservations about some of the wording was mentioned off-ticket. As such this was not voted in the May 2012 meeting. Once the ticket has been fixed it will be brought back for further discussion.
Originally by davesolt on 2012-06-21 14:45:46 -0500
The issue that came up in Japan is that the behavior of errors after MPI_Request_free is not unique to MPI_Request_free. Errors can occur after the successful return of MPI_Bsend and even MPI_Send and the user has no mechanism to be informed of these errors. If we rule that errors on a requests freed with MPI_Request_free force an MPI_Abort call it would seem that MPI_Bsend should also result in MPI_Abort. However, if an MPI_Send call fail during transmission results in MPI_Abort being called, then it would be nearly impossible to write any fault tolerant application. It seems unreasonable to expect that fault tolerant applications can only use synchronous sends to avoid MPI_Abort calls.
One solution is to instead raise an exception on the associated communicator, which would give the application the opportunity to decide how to handle the failure. Some applications would function correctly as long as the next send to the same rank returned failure while other applications may need to take more immediate action.
An MPI implementation is generally forced to reference count MPI communication objects (comms, files, windows) and should be able to tell if outstanding references are held by the user or are internal only. This is certainly true for any implementation that can detect basic incorrect use cases. Therefore, if the application is known to have reference to the underlying communication object, then the exception should be raised on the associated communication handle. If the application does not have a handle to the underlying communication object, then the exception should be raised on MPI_COMM_SELF.
Note that the implementation must correctly handle the case where one thread frees the underlying communication object while another thread is attempting to call an error handle. The MPI implementation must serialize these events and raise the exception on the right object (either the communication object or MPI_COMM_SELF) depending on the ordering of events.
The text for a modified proposal based on these ideas is given here:
2.9 (page 23 (roughly))
Another subtle issue arises because of the nature of asynchronous communications: MPI calls may initiate operations that continue asynchronously after the call returned. Thus, the operation may return with a code indicating successful completion, yet later cause an error exception to be raised. If there is a subsequent call that relates to the same operation (e.g., a call that verifies that an asynchronous operation has completed) then the error argument associated with this call will be used to indicate the nature of the error.
In a few cases, the error may occur after all calls that relate to the operation have completed, so that no error value can be used to indicate the nature of the error (e.g., an error on a buffered send operation).
In such cases, the exception is raised on the associated communication object. If this object has been marked for invalidation, the exception is raised on MPI_COMM_SELF instead.
OLD TEXT:
In a few cases, the error may occur after all calls that relate to the operation have completed, so that no error value can be used to indicate the nature of the error (e.g., an error on the receiver in a send with the ready modes).
Such an error must be treated as fatal, since information cannot be returned for the user to recover from it. When an error is treated as fatal then it has the same effect as calling MPI_ABORT on MPI_COMM_SELF.
3.7.3 (roughly page 53)
Advice to users. Once a request is freed by a call to MPI_REQUEST_FREE, it is not possible to check for the successful completion of the associated communication with calls to MPI_WAIT or MPI_TEST. Also, if an error occurs subsequently during the communication, an error code cannot be returned to the user -- In such cases, the exception is raised on the associated communication object. If this object has been marked for invalidation, the exception is raised on MPI_COMM_SELF instead, as described in section \cite{2.9}. An active receive request should never be freed as the receiver will have no way to verify that the receive has completed and the receive buffer can be reused. (End of advice to users.)
OLD TEXT:
Advice to users. Once a request is freed by a call to MPI_REQUEST_FREE, it is not possible to check for the successful completion of the associated communication with calls to MPI_WAIT or MPI_TEST. Also, if an error occurs subsequently during the communication, an error code cannot be returned to the user -- such an error must be treated as [fatal]fatal, which has the same effect as calling MPI_ABORT on MPI_COMM_SELF. An active receive request should never be freed as the receiver will have no way to verify that the receive has completed and the receive buffer can be reused. (End of advice to users.)
Originally by @wbland on 2015-04-03 13:15:51 -0500
We're tentatively scheduling this for a June reading. Could get pushed back if MPI 3.1 business takes extra time.
Originally by @wbland on 2015-05-27 15:47:39 -0500
Attachment added: mpi-ticket-324-v3.pdf
(2776.9 KiB)
Update to the document to add advice block and minor changes. To be read for June 2015 meeting.
Originally by @wbland on 2015-06-01 22:03:39 -0500
Notes from the reading are available on the wiki page: https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ftwg2015-06-01
Originally by @wbland on 2015-06-03 12:08:25 -0500
After lots more discussion, there's been quite a bit more added to this ticket to better clarify error handlers overall. This includes propagation/inheritance, defaults, scope, and adding a new error handler (which is in ticket #477). A new PDF has been added here. The plan is to do a plenary/reading in September.
Originally by bouteill on 2015-06-03 18:06:02 -0500
page 342 line 40: when the handler aborts processes, the implementation should try to provide an error code which is meaningful for the user, as an example, an error code that can be passed to MPI_ERROR_STRING.
Rationale for the change: easier to read, all error classes are their own error codes, and returning error codes can be more precise. There is no limit on the number of error classes either (MPI_ADD_ERROR_CLASS!)
page 377 line 34: is ---the same as--- the default error handler (remove "the same as")
page 454: Add before the new text: "By default, the predefined error handler for windows handles is MPI_ERRORS_ARE_FATAL". (adapted from FILE chapter wording). Otherwise the default handler on MPI_WIN_NULL is not defined.
Originally by @wbland on 2015-06-04 07:39:09 -0500
The last two comments are fine, but I have a problem with the first one. Error codes are only useful if you're still in an application to decode them. If you're aborting, the thing provided to abort should be provided to the user in the form of a return code. That's not useful since it can't be turned into a string, class, etc. anymore.
Replying to bouteill:
page 342 line 40: when the handler aborts processes, the implementation should try to provide an error code which is meaningful for the user, as an example, an error code that can be passed to MPI_ERROR_STRING.
Rationale for the change: easier to read, all error classes are their own error codes, and returning error codes can be more precise. There is no limit on the number of error classes either (MPI_ADD_ERROR_CLASS!)
page 377 line 34: is ---the same as--- the default error handler (remove "the same as")
page 454: Add before the new text: "By default, the predefined error handler for windows handles is MPI_ERRORS_ARE_FATAL". (adapted from FILE chapter wording). Otherwise the default handler on MPI_WIN_NULL is not defined.
Originally by bosilca on 2015-06-04 12:28:29 -0500
It has been mentioned that in case of an error on a free communicator the last errhandler set will be called. I don't think this is a safe behavior, as the errhandler might have been part of a shared library that has been unloaded following the communicator destruction. Thus, instead we should rely on errhandler known to exists.
Originally by @wbland on 2015-06-04 16:30:06 -0500
At some point, there was discussion that this could be an option for error reporting, but we never went down that route for the exact reasons you pointed out here.
Replying to bosilca:
It has been mentioned that in case of an error on a free communicator the last errhandler set will be called. I don't think this is a safe behavior, as the errhandler might have been part of a shared library that has been unloaded following the communicator destruction. Thus, instead we should rely on errhandler known to exists.
Originally by ftillier on 2015-06-07 21:03:22 -0500
In the latest PDF, you added language to MPI_COMM_CREATE and MPI_COMM_CREATE_GROUP indicating the inheritance of error handlers. The existing text for MPI_COMM_DUP describes the behavior for info hints too, and it would probably make sense to fix this for these two functions too.
The change of MPI_ABORT to PMPI_ABORT page 342, both with ticket #324 and #477, seems awkward. There are many other places where functionality is described "as-if" other MPI calls had been made, and these still use the MPI call, not the PMPI one. I'm not sure that the precision of the PMPI reference is worth the inconsistency and potential confusion.
Page 363, line 16: Suggest the following wording: "When aborting a subset of processes, a high quality implementation should be able to provide correct error handling for communicators containing both aborted and non-aborted processes."
Originally by @wbland on 2015-06-08 09:02:46 -0500
Replying to ftillier:
In the latest PDF, you added language to MPI_COMM_CREATE and MPI_COMM_CREATE_GROUP indicating the inheritance of error handlers. The existing text for MPI_COMM_DUP describes the behavior for info hints too, and it would probably make sense to fix this for these two functions too.
I'm not sure what you mean. Are you saying that we should specified that MPI_COMM_DUP carries over the error handler, because that's already in the ticket. Are you saying we should fix something with info keys on dup? Because that's a separate ticket (#476).
The change of MPI_ABORT to PMPI_ABORT page 342, both with ticket #324 and #477, seems awkward. There are many other places where functionality is described "as-if" other MPI calls had been made, and these still use the MPI call, not the PMPI one. I'm not sure that the precision of the PMPI reference is worth the inconsistency and potential confusion.
I agree. The reason we put this in there was because the tools people (Martin) wanted to specify whether the tools interface would be called when the error handler calls MPI_ABORT. We agreed in the room that it should not call the tools interface because the communicator may or may not be well defined by the time the error handler is called (or may not have ever existed in some cases) so we didn't want to specify how the tools interface should deal with that.
Page 363, line 16: Suggest the following wording: "When aborting a subset of processes, a high quality implementation should be able to provide correct error handling for communicators containing both aborted and non-aborted processes."
Fair enough. That's a cleaner way of saying it. I'll post a new PDF soon.
Originally by bouteill on 2015-06-23 13:05:12 -0500
The following text needs to be updated
MPI_COMM_CALL_ERRHANDLER(COMM, ERRORCODE, IERROR) INTEGER COMM, ERRORCODE, IERROR This function invokes the error handler assigned to the communicator with the error code supplied. This function returns MPI_SUCCESS in C and the same value in IERROR if the error handler was successfully called (assuming the process is not aborted and the error handler returns). Users should note that the default error handler is MPI_ERRORS_ARE_FATAL. Thus, calling MPI_COMM_CALL_ERRHANDLER will abort the comm processes if the default error handler has not been changed for this com- municator or on the parent before the communicator was created. (End of advice to users.)
Originally by bouteill on 2015-06-23 13:11:57 -0500
For the wording "as if MPI_ABORT" was called, I found the following wording somewhere else: "MPI_ERRHANDLERFREE should be called with the error handler returned from MPI{COMM,WIN,FILE}_GET_ERRHANDLER to mark the error handler for deallocation. This provides behavior similar to that of MPI_COMM_GROUP and MPI_GROUP_FREE."
Wording "provides behavior similar to MPI_ABORT" should ease the PMPI ambiguity.
Originally by @wbland on 2015-06-29 08:48:55 -0500
I updated the ticket to address the issues raised by Aurelien in the last two comments.
Originally by bouteill on 2015-06-29 12:23:07 -0500
The new text still has "call to MPI_Abort" which I believe will still be seen as confusing.
Originally by @wbland on 2015-09-10 19:38:57 -0500
Attachment added: mpi-report-tickets324-477.pdf
(2796.9 KiB)
Specification of tickets 324 and 477
Originally by jjhursey on 2012-02-27 06:20:57 -0600
Original## Problem
Section 8.3 is imprecise about the set of processes over which
MPI_ERRORS_ARE_FATAL
is applied. Since the error handle is associated with a communication object, then it is implied that the error handle is only applied to the group of processes associated with that communication object. Therefore it should be clarified that the processes in that group are aborted.Similarly,
MPI_ABORT
(p294, Section 8.7) takes as an argument a communicator that defines the scope of the abort. The defining test specifies the operation is "... to abort all tasks in the group of comm." SinceMPI_ERRORS_ARE_FATAL
is specified as an "abort" operation, it should fall under these same restrictions when applied to a communication object.Proposed Solution
Clarify the text in the MPI standard per the pdf document attached to this ticket. Search for ticket324.
Note that text changes are in the following chapters:
Implementation
Finished in Open MPI branch.
UpdatedAfter discussion in Chicago (06/2015), the forum decided that it would rather have
MPI_ERRORS_ARE_FATAL
maintain its current expected behavior which is to abort all connected processes. This allows applications that expect that behavior (to avoid things like being overcharged for their allocation) to maintain their current behavior. The new solution is to better clarify error handlers in general, includingMPI_ERRORS_ARE_FATAL
here and to open another new ticket (#477) which specifies a new error handler which captures the original goals of this ticket (abort a subset of the applications when the new error handler is called).This text now includes text which:
MPI_ERRORS_ARE_FATAL
causes all connected processes to abort;MPI_ABORT
when error handlers call it;MPI_COMM_SELF
;MPI_WIN_NULL
.