Closed mpiforumbot closed 8 years ago
Originally by htor on 2009-01-21 12:56:57 -0600
Replying to Rolf Rabenseifner from Ticket #83
Review: p2:30 of your proposal (pdf, Jan 16, 2009) states already "The collective operations do not have a message tag argument." There isn't a rationale. On p50:20, you repeat this sentence and you add a rationale on p50:26-28 that is valid only for nonblocking collectives.
Therefore, the wording looks a little bit inconsistent.
My proposal 1: Remove p50:20 and p50:26-28.
My proposal 2: Change p50:20 to: "As with blocking collective operations, also nonblocking collective operations do not have a message tag argument." and do not put the sentence in a new paragraph.[[BR]] Change p50:26-28 to:
"The use of tags for nonblocking collectives would prevent certain hardware-optimizations. I chose proposal 2 and rephrased it a tiny bit. Please check the new phrasing!Further problems with the current text:
Matching rule on p50:34-45:
Up to this point, it is not clear, that IBARRIER and BARRIER are different collective operations (they are different routines, that is clear). Therefore, we need early wording telling this. I propose to add on p50:35 before "In particular":[[BR]] "Blocking and nonblocking collective routines are regarded as different operations." [[BR]] I added: "In this sense, blocking and nonblocking collective operations are different collective operations." the paragraph has a many "collective operations" now, but it's clear.
and I would remove "(blocking and nonblocking)" on line p50:34. done
Originally by htor on 2009-01-21 13:30:46 -0600
Attachment added: nbc-proposal-rev-1.pdf
(498.1 KiB)
Originally by bronis on 2009-01-21 17:04:00 -0600
I reviewed the proposal extensively and have since been monitoring any changes. It looks ready to go to me.
Originally by RolfRabenseifner on 2009-01-22 03:31:38 -0600
nbc-proposal-rev-1.pdf, p71:41:43 reads
The sequence of the two calls is irrelevant because the two nonblocking operations are performed on different communicators in each process.
but should read ("In each process" moved to the beginning)
In each process, the sequence of the two calls is irrelevant because the two nonblocking operations are performed on different communicators.
Reason: Otherwise, the wording "the two calls" at the beginning of the sentence is strange, because the reader can see only "the six calls".
Originally by RolfRabenseifner on 2009-01-22 04:07:40 -0600
Independent of the discussion about associative routines and flag in MPI_OP_CREATE and the not existing associativity of built-in floating-point operations, and based on your decision, not to put ticket #96 on the list in this ticket, I propose a small rationale and advice to users, added after the advice to implementors on nbc-proposal-rev-1.pdf, p62:35-38. (This proposal does not want to change the current definition, it only wants to clarify it. The solution of Ticket #96 wants to make a real change.)
Please add after nbc-proposal-rev-1.pdf, p62:38
Rationale. The same recommendation is given for MPI_REDUCE. Because implementations are allowed not (or only partially) to implement this hint blocking and nonblocking reduction, this recommendation is not implied by the requirement, that MPI_IREDUCE returns the same result as MPI_REDUCE. (End of rationale.)
Advice to users. Because two calls to MPI_REDUCE with same input arguments may return different results with operations that are not bit-wise associative (e.g., floating point sum), implementations that do not fully implement the same recommendation for MPI_REDUCE may return different results from the blocking and non-blocking reduction operation. (End of advice to users.)
Originally by bronis on 2009-01-22 08:24:09 -0600
With regard to Rolf's suggestion for the text on page 71, starting with the prepositional phrase is a weak construction. It would be much better to make it clear that Rolf's reading is that the phrase modifies the "two calls":
The sequence of the two calls in each process is irrelevant because the two nonblocking operations are performed on different communicators.
Originally by htor on 2009-01-27 20:36:37 -0600
closed several sub-tickets (decided in today's telecon) and opened a new one to keep track of important changes. Revision 2 has been finished after the 1/27 teleconference and is attached to this ticket.
Originally by htor on 2009-01-27 20:43:01 -0600
Attachment added: nbc-proposal-rev-2.pdf
(499.9 KiB)
Originally by htor on 2009-02-02 22:56:25 -0600
Attachment added: nbc-proposal-rev-3.pdf
(500.3 KiB)
Originally by htor on 2009-02-02 22:57:59 -0600
Added revision 3 of the official proposal which will be discussed at the next Forum meeting.
Originally by rsthakur on 2009-02-08 20:13:08 -0600
I just read the nonblocking collectives document for the first time. Sorry for the late round of comments.
pg 49, ln 35 and elsewhere: I think we should just say "nonblocking call" instead of nonblocking start call". Saying "start" can lead to confusion with MPI_Start and persistent communication.
pg 49, ln 39-42: I don't understand what could lead to a deadlock as described here.
pg 50, ln 5: "Similarly to the blocking case" should be "Similar to the blocking case"
pg 50, ln 9: operation.) should be operation).
pg 50, ln 33: instead of "it will fail and generate an MPI exception", which is C++ specific, you could say "it will return an error".
Section 5.12.1 onwards: All these functions have some trivial sentences in them, but they are all different. Why not be consistent all through and just say "MPI_Ixxx is the nonblocking version of MPI_xxx" everywhere. The rest of the definition is implied. (See the split-collective I/O functions. They don't say anything in fact.)
Particularly misleading is the part that says what happens when the operation completes. e.g., pg 51, ln 36. The reader who looks just at the function definition will think Ibarrier doesn't return until all others have called Ibarrier. (Completion is defined in the two preceding pages, but people won't read those first unless you force them.) Either remove that part or say when the corresponding test or wait returns. I am ok with not saying anything separately for each function, and instead defining the common behavior for all functions in Sec 5.11 (because it is the same for all functions when compared with their blocking counterparts).
Originally by bronis on 2009-02-08 21:12:22 -0600
Replying to rsthakur:
pg 50, ln 5: "Similarly to the blocking case" should be "Similar to the blocking case"
I disagree. they "are considered" how? Similarly.
[snip]
pg 50, ln 33: instead of "it will fail and generate an MPI exception", which is C++ specific, you could say "it will return an error".
I understand your position but this is a controversial topic. My opinion is that none of the possible wordings are perfect. Others strongly felt that "return an error" was incorrect. We will have to discuss this in San Jose.
Originally by rsthakur on 2009-02-08 21:58:25 -0600
Replying to bronis:
pg 50, ln 33: instead of "it will fail and generate an MPI exception", which is C++ specific, you could say "it will return an error".
I understand your position but this is a controversial topic. My opinion is that none of the possible wordings are perfect. Others strongly felt that "return an error" was incorrect. We will have to discuss this in San Jose.
There is precedence elsewhere in the document for saying return an error. For example:
30:24-25 "The receive operation will return an error code."
48:16-17 "If the call causes some system resource to be exhausted, then it will fail and return an error code."
263:1-2 "The function MPI_ALLOC_MEM may return an error code of class MPI_ERR_NO_MEM to indicate it failed because memory is exhausted."
263:16-17 "The function MPI_FREE_MEM may return an error code of class MPI_ERR_BASE to indicate an invalid base argument."
378:32-33 "If the file does not exist, MPI_FILE_DELETE raises an error in the class MPI_ERR_NO_SUCH_FILE."
Originally by bronis on 2009-02-09 01:12:45 -0600
Replying to rsthakur:
There is precedence elsewhere in the document for saying return an error. For example:
30:24-25 "The receive operation will return an error code."
48:16-17 "If the call causes some system resource to be exhausted, then it will fail and return an error code."
Yes. The concern arises with having nonblocking calls return an error since the error may not be detected until after the call has returned. I know that the example from page 48 is about the nonblocking point-to-points. The reasonable conclusion is that it requires flow control. Many do not like that conclusion...
Originally by jsquyres on 2009-02-09 10:14:10 -0600
Replying to bronis:
pg 50, ln 33: instead of "it will fail and generate an MPI exception", which is C++ specific, you could say "it will return an error".
I understand your position but this is a controversial topic. My opinion is that none of the possible wordings are perfect. Others strongly felt that "return an error" was incorrect. We will have to discuss this in San Jose.
MPI-2.1 8.3 p264:19-22 is the argument for using the phrase "MPI exception":
An MPI implementation cannot or may choose not to handle some errors that occur during MPI calls. These can inlucde errors that generate exceptions or traps, such as floating point errors or access violations. The set of errors that are handled by MPI is implementation dependent. Each such error generates an MPI exception.
(note that the boldface is in the spec; I didn't add that here on the ticket)
Originally by htor on 2009-02-09 13:19:59 -0600
Replying to rsthakur:
pg 49, ln 35 and elsewhere: I think we should just say "nonblocking call" instead of nonblocking start call". Saying "start" can lead to confusion with MPI_Start and persistent communication. ok, that can be fixed easily
pg 49, ln 39-42: I don't understand what could lead to a deadlock as described here. consider comm-layout from Fig. 5.13 and implement example 5.34 with blocking collectives.
pg 50, ln 5: "Similarly to the blocking case" should be "Similar to the blocking case" see Bronis' comment
pg 50, ln 9: operation.) should be operation). can be fixed easily
pg 50, ln 33: instead of "it will fail and generate an MPI exception", which is C++ specific, you could say "it will return an error". see Jeff's comment. We had a long discussion during our last telecon about this :).
Section 5.12.1 onwards: All these functions have some trivial sentences in them, but they are all different. Why not be consistent all through and just say "MPI_Ixxx is the nonblocking version of MPI_xxx" everywhere. The rest of the definition is implied. (See the split-collective I/O functions. They don't say anything in fact.) this can also be fixed easily - we just have to decide on a standard sentence.
Particularly misleading is the part that says what happens when the operation completes. e.g., pg 51, ln 36. The reader who looks just at the function definition will think Ibarrier doesn't return until all others have called Ibarrier. (Completion is defined in the two preceding pages, but people won't read those first unless you force them.) hmm, but those people should wonder what the difference between barrier and ibarrier then is. I think it's pretty obvious, and clearly defined. All a standard has to be?
Either remove that part or say when the corresponding test or wait returns. I am ok with not saying anything separately for each function, and instead defining the common behavior for all functions in Sec 5.11 (because it is the same for all functions when compared with their blocking counterparts). yes, that brings us back to the standard sentence. The other reviewers agree to the current wording, so we can certainly pull one of the sentences out (as was already proposed in the group) and make it the default for all.
Best, Torsten
Originally by RolfRabenseifner on 2009-02-09 18:38:09 -0600
All okay, except Version 3, p50:41 reads
other collective operation in between. ...
but should read
other collective operation on the same communicator in between. ...
Originally by htor on 2009-02-11 01:52:43 -0600
Attachment added: nbc-proposal-rev-4.pdf
(501.3 KiB)
Originally by moody20 on 2009-02-11 13:12:48 -0600
The following page and line numbers are relative to the rev-4 text:
page 2, line 20 "completion operation" --> "Completion call"
page 50, line 6 (since this is a "that is", it's important to again stress the localness here) "i.e., the semantics" --> "i.e., for the caller, the semantics"
page 51, line 37 (keep verb tense, and also matches description for barrier better) "have called" --> "calls"
page 51, line 37 split intra and inter sentences into two paragraphs (all other collectives discuss these in different paragraphs)
page 51, line 37 "call MPI_IBARRIER." --> "call MPI_IBARRIER, add vice versa."
Originally by moody20 on 2009-02-11 13:16:22 -0600
whoops, got a 'C' instead of a 'c' in above comment...
page 2, line 20
"a separate completion operation" --> "a separate completion call"
Originally by moody20 on 2009-02-11 13:21:50 -0600
Also, there was discussion about potential confusion for readers over "completion" of the IBARRIER in the description. Rajeev suggested looking at UPC.
Or here is a suggestion (add the word 'can'):
"On intracommunicators, the operation can complete only after every process in the communicator calls MPI_IBARRIER."
Maybe this helps?
Originally by moody20 on 2009-02-13 11:48:40 -0600
Also, there was agreement to add an "Advice to Users" to do a gather and local sum if they need precise results in a collective with data types that are not associative.
Originally by moody20 on 2009-02-13 11:49:34 -0600
Whoops, I guess there is a new "Implementation Status" that defaults to "Unnecessary". I'll change this to "Waiting" for now.
Originally by htor on 2009-02-20 20:24:29 -0600
Attachment added: patch_libnbc_1.0_rev276_intercomm_bcast.diff
(4.1 KiB)
Originally by htor on 2009-02-20 20:34:48 -0600
Regarding the inter-communicator implementation of nonblocking collectives. The implementation is trivial based on the techniques described in Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. The schedule-based design can easily be applied to inter-communicators as well. I added a patch for LibNBC 1.0 which extends it with a nonblocking broadcast that works on inter-communicators. Calling this function or the intra-comm version can simply be decided based on MPI_Comm_test_inter().
Please let me know if there are any further concerns regarding inter-communicators.
Originally by htor on 2009-02-20 21:14:00 -0600
Attachment added: nbc-proposal-rev-5.pdf
(504.4 KiB)
Originally by htor on 2009-02-20 21:16:14 -0600
added revision 5 that addresses all comments from the reading at the last Forum meeting. The associativity issue will be pushed towards MPI-2.2, see #136, Please review and comment!
Originally by bronis on 2009-02-20 22:23:56 -0600
Torsten:
Can you post a diff between version 5 and version 4? Is it just what is in Adam's notes from the meeting? Or was there more?
Bronis
Originally by htor on 2009-02-20 22:30:51 -0600
Bronis, yes, there was also the unification of the operation descriptions. I just sent an email to the mailinglist with the diff. It's available at http://www.unixer.de/sec/nbc-proposal-rev-5.diff .
Originally by bronis on 2009-02-21 11:00:20 -0600
On page 51, line 37, please change "notifies" to "indicates":
"a process notifies that it has reached the barrier" => "a process indicates that it has reached the barrier"
On page 62, line 41, please change "which" to "that" (the clause is required):
"For operations which are not truly associative" => "For operations that are not truly associative"
I will suggest the same change in the MPI 2.2 ticket...
Other than these two small issues, looks fine.
Originally by bronis on 2009-02-21 11:02:33 -0600
Skip the statement about MPI 2.2 -- now that I review ticket 136, I see that the text is different.
Originally by rsthakur on 2009-03-25 22:05:00 -0500
I still think that a sentence like "The data movement after MPI_IBCAST completes is identical to the data movement after MPI_BCAST completes" in the definition of MPI_IBCAST can be misleading to the average user who picks up the book and looks at the definition of MPI_IBCAST. Just the first sentence "This call starts a nonblocking variant of MPI_BCAST." is sufficient here and will not be misleading.
(This problem exists in many of the remaining function definitions.)
Originally by htor on 2009-04-06 19:06:28 -0500
Attachment added: nbc-proposal-rev-6.pdf
(499.1 KiB)
Originally by htor on 2009-04-06 19:10:50 -0500
Attachment added: nbc-proposal-rev-6.diff
(8.5 KiB)
diff between rev5 and rev6
Originally by htor on 2009-04-06 19:17:54 -0500
Attachment added: nbc-proposal-rev-6.2.pdf
(499.0 KiB)
Originally by rsthakur on 2009-04-06 21:18:32 -0500
The text looks good now. One point to note is if ticket #27 goes through in 2.2 (MPI_Reduce_scatter_block), its nonblocking version would need to be added here.
Originally by rsthakur on 2009-04-06 22:10:52 -0500
Just to serve as a reminder, this ticket is also affected by the outcomes of ticket #117 (MPI_Count) and ticket #140 (const), in addition to ticket #27 mentioned above (reduce_scatter_block).
Originally by htor on 2009-06-09 15:50:40 -0500
We had the second vote unanimously, thus setting the ticket to passed.
Originally by jsquyres on 2010-09-20 07:46:46 -0500
Voting was reset in order to integrate this text with the MPI-3 text.
First reading was in the June 2010 meeting. 1st vote (passed unanimously) in the Sept 2010 meeting.
Originally by RolfRabenseifner on 2010-10-21 12:16:19 -0500
First part of the a PDF-review: I compared coll.tex svn -r495 with -r604 and detected some errors:
You removed all the % at the end of \exindex{....}%[[BR]] These % are not needed in your case, but introduced everywhere, because with \exindex starting on the first column and ending with %, no space or lineend is added to the Latex source toprevent these ugly double-spacings that we could see a several locations.
You may redo these % to mimimize the changes between review basis -r 495 and current version.
I'll ask also Bill to add this advice in instr.tex
In Example 5.24, you removed the compilation-check header. In rev. 495, it was on lines 3916-3922. In current rev. 604 it should be re-added after line 4733.
Those headers are useful and I do not understand why you removed it here?
I set three author marks because this ticket has parts in Chapters Collective, Change-Log, and Bibliography.
Please add a label to your new Section 5.12. I need this for a correct Change-Log entry.
Originally by htor on 2010-10-24 19:12:04 -0500
Replying to RolfRabenseifner:
First part of the a PDF-review: I compared coll.tex svn -r495 with -r604 and detected some errors:
- According to the rules, all changes to an existing chapter should be markuped:
- Your new section 5.12 should be markuped.
- Your new examples in Section 5.13 should be markuped. Thanks, revision 613 fixes this problem.
- You have done a lot of formatting changes removing senseless spaces. I checked all and did not find any editing erro on these changes. Markup is senseless for this. Yes, they looked ugly and were in my editor freedom -- thanks for checking!
- In three examples, you removed the '!' in the examples index entry. This is wrong. The '!' is needed to allow sub-entries below an main-entry, here "Deadlock". Please correct "Deadlock " to "Deadlock!" in \exindex of Examples 5.24, 5.25 and 5.26. Wow, thanks! I wasn't aware of this syntax. Fixed in revision 614.
You removed all the % at the end of \exindex{....}%[[BR]] These % are not needed in your case, but introduced everywhere, because with \exindex starting on the first column and ending with %, no space or lineend is added to the Latex source toprevent these ugly double-spacings that we could see a several locations.
You may redo these % to mimimize the changes between review basis -r 495 and current version. I'd rather not apply changes that are not visible in the built document (unless you really insist) since the document has been reviewed already. Please let me know.
I'll ask also Bill to add this advice in instr.tex
In Example 5.24, you removed the compilation-check header. In rev. 495, it was on lines 3916-3922. In current rev. 604 it should be re-added after line 4733.
Those headers are useful and I do not understand why you removed it here? That was clearly an editing error (I blame my cleanup script). Thanks for catching this! It's fixed in revision 615.
I set three author marks because this ticket has parts in Chapters Collective, Change-Log, and Bibliography.
Please add a label to your new Section 5.12. I need this for a correct Change-Log entry. Sure, see revision 616 and \label{sec:nbcoll}.
Thanks & Best, Torsten
Originally by htor on 2010-10-24 19:41:22 -0500
Yutaka Ishikawa and Hideyuki Jitsumoto realized that the nonblocking collective chapter includes C++ bindings (marked as deprecated). The question is now if we want to keep them or not. I would strongly suggest to keep those bindings since there are discussions about reviving the old C++ bindings. However, the Forum should have the last word.
Originally by jsquyres on 2010-10-25 08:18:58 -0500
This was discussed in a Forum meeting.
The decision was to leave the C++ bindings in the non-blocking collectives because they have been voted in (to be clear: the C++ bindings were not deprecated when NBC was voted in). If someone wants to raise a ticket to remove the C++ bindings from the NBC, they are free to do so. But that should be a separate issue from this ticket, IMNSHO. :-)
Whether or not all C++ bindings are un-deprecated is on the agenda for the December 2010 meeting, I believe. That is quite definitely a separate issue and should not affect this ticket.
Originally by RolfRabenseifner on 2010-10-26 08:33:39 -0500
Review of svn -r 628:
Torsten,
Section 5.1:
In the lists on the 1st, 5th, and 6th page, items about MPI_REDUCE_SCATTER:[[BR]] MPI-3.0 always puts MPI_REDUCE_SCATTER_BLOCK first and the "V version", i.e., MPI_REDUCE_SCATTER, on last position.
Please can you therefore change the sequence in your lists to[[BR]]
MPI_REDUCE_SCATTER_BLOCK, MPI_IREDUCE_SCATTER_BLOCK, MPI_REDUCE_SCATTER,
MPI_IREDUCE_SCATTER (must be done three times).
Section 5.1, first page, list item about MPI_REDUCE_SCATTER_BLOCK:
Your text reads
Section 5.10 and Section 5.12.10
but should read
Section 5.10, Section 5.12.9, and Section 5.12.10
Section 5.1, 2nd page:
Your text reads
The completion of a collective operation indicates that the caller is free to modify locations in the communication buffer.
but was originally voted to read
The completion of a collective operation indicates that the caller is free to access locations in the communication buffer.
As far as I know, this sensful modification was approved by the Forum. Please can you verify/comment on this.
Section 5.1, 2nd page:
Your text reads
Thus, a collective communication function may, or may not, have the effect of synchronizing all calling processes.
but should read (as voted)
Thus, a collective communication operation may, or may not, have the effect of synchronizing all calling processes.
Bill,
\def\MPIIIIDOTO/{\textsf{MPI-3.0}} % MPI-3.0 macro - NOTE it has a O (Oh)
% not 0 (zero)
\def\mpiiiidoto/{\textsf{MPI-3.0}} % MPI-3.0 macro - NOTE it has a o (oh)
% not 0 (zero)
On page ii, text reads
This version, MPI-3,
but should read
This version, MPI-3.0,
Originally by htor on 2010-10-26 10:27:16 -0500
Replying to RolfRabenseifner:
Review of svn -r 628:
Torsten,
Section 5.1:
In the lists on the 1st, 5th, and 6th page, items about MPI_REDUCE_SCATTER:[[BR]] MPI-3.0 always puts MPI_REDUCE_SCATTER_BLOCK first and the "V version", i.e., MPI_REDUCE_SCATTER, on last position. Good catch! Done.
Please can you therefore change the sequence in your lists to[[BR]] MPI_REDUCE_SCATTER_BLOCK, MPI_IREDUCE_SCATTER_BLOCK, MPI_REDUCE_SCATTER,
MPI_IREDUCE_SCATTER (must be done three times).Section 5.1, first page, list item about MPI_REDUCE_SCATTER_BLOCK:
Your text reads
Section 5.10 and Section 5.12.10
but should read
Section 5.10, Section 5.12.9, and Section 5.12.10 Good catch -- fixed!
All fixes above are on revision 630.
Section 5.1, 2nd page:
Your text reads
The completion of a collective operation indicates that the caller is free to modify locations in the communication buffer.
but was originally voted to read
The completion of a collective operation indicates that the caller is free to access locations in the communication buffer.
As far as I know, this sensful modification was approved by the Forum. Please can you verify/comment on this. This is strange, it's "access" in MPI-2.1, however, it's "modify" in MPI-2.2. This explains the difference between my chapter (based on MPI-2.1) that we voted on and the current version. However, "modify" seems correct and implies "access" in my opinion. In any case, such a change is not permitted withing my editorial freedom :-). We may want to discuss this at the next Forum. However, I don't see a reason to stop the process because of this, do you?
Section 5.1, 2nd page:
Your text reads
Thus, a collective communication function may, or may not, have the effect of synchronizing all calling processes.
but should read (as voted)
Thus, a collective communication operation may, or may not, have the effect of synchronizing all calling processes. This is corrected in revision 631.
- Your Revision r617 | htor | 2010-10-25 02:28:57 +0200 (Mo, 25. Okt 2010)[[BR]] "changed multiple C/C++ definitions from 'type var' to 'type var' after review from Yutaka Ishikawa and Hidejuki Jitsumoto"[[BR]] should be revised again. I cannot see the reviews from Yutaka Ishikawa and Hidejuki Jitsumoto. Looking at MPI-2.2, the rules are simple:[[BR]][[BR]] void* xxx (is used for all buffers)[[BR]] tttt yyy (is used for the rest)[[BR]][[BR]] You made several positiv changes for wrong formatted buffers, but you also introduced several formatting changes that are clearly inconsistent with the main parts of the standard.[[BR]] [[BR]] Therefore please revert to (but please keep all your correction on "void buf")
- In MPI_Op_create: MPI_Op *op
- In C MPI_User_function: int len, MPI_Datatype datatype
- In C MPI_User_function: MPI_Datatype *datatype
- In all nonblocking collective routines: MPI_Request *request Ok, I don't remember that we agreed on specific rules for this but it's good to make it consistent with the remaining standard. I fixed them in revision 632.
After these changes, you may set the Priority from "Passed" to
"Waiting for PDF reviews". Done.Please check my new section in the Change-Log (rev.629). I will do that as soon as the macros is added (doesn't build right now).
It would be great if you could change the diffs of the edits I just did.
Thanks for the reviews, Torsten
Originally by moody20 on 2010-10-29 01:11:41 -0500
About half way through the review, but here are some items for now.
Nit picking, but ... on page 131, change Section 5.10 and Section 5.12.9 and Section 5.12.10 to Section 5.10, Section 5.12.9, and Section 5.12.10
FYI: The "access" to "modify" change came in with ticket 45, so it's correct to leave it as "modify".
On page 184, line 18-24, we should add something to specify what get_count / get_elements / test_canceled return on a status for a non-blocking collective call, as done with the status for a non-blocking send (in MPI-2.2, see page 53, lines 22-27). New ticket needed for this?
Recommend change (page 184, line 31) If the nonblocking call causes some system resource to be exhausted, then it will fail ... to If the nonblocking call causes some system resource to be exhausted, then it may fail ...
Originally by RolfRabenseifner on 2010-10-29 04:33:39 -0500
I agree with all of Adam Moody's comments.
Especially the recommended change (i.e., from "will fail" to "may fail") must be done because the "will fail" is a clear inconsistency to the error handling described in MPI-2.2 Sect. 2.8 Error Handling, page 23, lines 7-9:
Several factors limit the ability of MPI calls to return with meaningful error codes when an error occurs. MPI may not be able to detect some errors; other errors may be too expensive to detect in normal execution mode;
About MPI status on return from Test or Wait on nonblocking collectives:
And therefore I propose:
In the new Section 5.12 before the paragraph "Multiple nonblocking collective...", the following paragraph should be added:
The fields in a status object returned by a call to MPI_{TEST|WAIT}{ALL|SOME}, where the request corresponds to a nonblocking collective call, are undefined, with one exceptions: The error status field will contain valid information if the wait or test call returned with MPI_ERR_IN_STATUS.
This text differs from the text used for nonblocking point-to-point because MPI_CANCEL is not allowed (and therefore MPI_TEST_CANCELLED cannot be applied), and because return value MPI_ERR_IN_STATUS is valid in routines that return an array of statuses, see MPI-2.2, page 32, lines 24-33:
In general, message-passing calls do not modify the value of the error code field of status variables. This field may be updated only by the functions in Section 3.7.5 which return multiple statuses. The field is updated if and only if such function returns with an error code of MPI_ERR_IN_STATUS.
Rationale. The error field in status is not needed for calls that return only one status, such as MPI_WAIT, since that would only duplicate the information returned by the function itself. The current design avoids the additional overhead of setting it, in such cases. The field is needed for calls that return multiple statuses, since each request may have had a different failure. (End of rationale.)
There is no need to mention MPI_TEST_CANCELLED, because already in the text about nonblocking point-to-point, the routine MPI_GET_COUNT is not mentioned also it cannot be called with a status returned from a nonblocking send operation. This is already covered by the text that clearly states, which information is only available.
Because this text is a reduced copy of the text used for point-to-point, i.e., it is nothing new or modified, I would vote for that this can be done by the chapter committee and a new ticket is not needed.
Torsten, when you implement this in Latex, please look at the text in pt2pt.tex. Please add also the appropriate
\mpifuncindex{MPI_WAITALL}%[[BR]] \mpifuncindex{MPI_WAITSOME}%[[BR]] \mpifuncindex{MPI_TESTALL}%[[BR]] \mpifuncindex{MPI_TESTSOME}%
lines as shown in ei-2.tex. (In pt2pt.tex, they are missing, because there is already a real reference to MPI_Wait and MPI_TEST.)
Originally by htor on 2010-11-01 12:12:16 -0500
Replying to moody20:
About half way through the review, but here are some items for now.
Nit picking, but ... on page 131, change Section 5.10 and Section 5.12.9 and Section 5.12.10 to Section 5.10, Section 5.12.9, and Section 5.12.10 Fixed in rev. 662. Thanks.
FYI: The "access" to "modify" change came in with ticket 45, so it's correct to leave it as "modify". I see, the first confusion caused by removing the change macros.
On page 184, line 18-24, we should add something to specify what get_count / get_elements / test_canceled return on a status for a non-blocking collective call, as done with the status for a non-blocking send (in MPI-2.2, see page 53, lines 22-27). New ticket needed for this? True -- See my answer to Rolf's comment below.
Recommend change (page 184, line 31) If the nonblocking call causes some system resource to be exhausted, then it will fail ... to If the nonblocking call causes some system resource to be exhausted, then it may fail ... Oh yes -- fixed in rev. 663.
Thanks, Torsten
Originally by htor on 2010-11-01 12:23:04 -0500
Replying to RolfRabenseifner:
I agree with all of Adam Moody's comments.
Especially the recommended change (i.e., from "will fail" to "may fail") must be done because the "will fail" is a clear inconsistency to the error handling described in MPI-2.2 Sect. 2.8 Error Handling, page 23, lines 7-9: Yes, that's fixed.
Several factors limit the ability of MPI calls to return with meaningful error codes when an error occurs. MPI may not be able to detect some errors; other errors may be too expensive to detect in normal execution mode;
About MPI status on return from Test or Wait on nonblocking collectives:
- This must be well defined before the new functionality is implemented.
- Therefore it should definitely be part of the draft for MPI-3.0.
And therefore I propose:
In the new Section 5.12 before the paragraph "Multiple nonblocking collective...", the following paragraph should be added:
The fields in a status object returned by a call to MPI_{TEST|WAIT}{ALL|SOME}, where the request corresponds to a nonblocking collective call, are undefined, with one exceptions: The error status field will contain valid information if the wait or test call returned with MPI_ERR_IN_STATUS. Can you elaborate what this adds to the current wording "Upon returning from a completion call in which a nonblocking collective operation completes, the MPI_ERROR field in the associated status object is set appropriately. The values of the MPI_SOURCE and MPI_TAG fields are undefined."
Yes, we don't describe what happens if one calls the query functions but I agree to Rolf on this.
This text differs from the text used for nonblocking point-to-point because MPI_CANCEL is not allowed (and therefore MPI_TEST_CANCELLED cannot be applied), and because return value MPI_ERR_IN_STATUS is valid in routines that return an array of statuses, see MPI-2.2, page 32, lines 24-33:
In general, message-passing calls do not modify the value of the error code field of status variables. This field may be updated only by the functions in Section 3.7.5 which return multiple statuses. The field is updated if and only if such function returns with an error code of MPI_ERR_IN_STATUS.
Rationale. The error field in status is not needed for calls that return only one status, such as MPI_WAIT, since that would only duplicate the information returned by the function itself. The current design avoids the additional overhead of setting it, in such cases. The field is needed for calls that return multiple statuses, since each request may have had a different failure. (End of rationale.)
There is no need to mention MPI_TEST_CANCELLED, because already in the text about nonblocking point-to-point, the routine MPI_GET_COUNT is not mentioned also it cannot be called with a status returned from a nonblocking send operation. This is already covered by the text that clearly states, which information is only available.
Because this text is a reduced copy of the text used for point-to-point, i.e., it is nothing new or modified, I would vote for that this can be done by the chapter committee and a new ticket is not needed.
Torsten, when you implement this in Latex, please look at the text in pt2pt.tex. Please add also the appropriate
\mpifuncindex{MPI_WAITALL}%[[BR]] \mpifuncindex{MPI_WAITSOME}%[[BR]] \mpifuncindex{MPI_TESTALL}%[[BR]] \mpifuncindex{MPI_TESTSOME}%
lines as shown in ei-2.tex. (In pt2pt.tex, they are missing, because there is already a real reference to MPI_Wait and MPI_TEST.)
Thanks, Torsten
Originally by RolfRabenseifner on 2010-11-01 18:09:31 -0500
Replying to htor:
Replying to RolfRabenseifner: ...
In the new Section 5.12 before the paragraph "Multiple nonblocking collective...", the following paragraph should be added:
The fields in a status object returned by a call to MPI_{TEST|WAIT}{ALL|SOME}, where the request corresponds to a nonblocking collective call, are undefined, with one exceptions: The error status field will contain valid information if the wait or test call returned with MPI_ERR_IN_STATUS. Can you elaborate what this adds to the current wording "Upon returning from a completion call in which a nonblocking collective operation completes, the MPI_ERROR field in the associated status object is set appropriately. The values of the MPI_SOURCE and MPI_TAG fields are undefined."
Yes, we don't describe what happens if one calls the query functions but I agree to Rolf on this.
I have overseen the existing text
Upon returning from a completion call in which a nonblocking collective operation completes, the MPI_ERROR field in the associated status object is set appropriately. The values of the MPI_SOURCE and MPI_TAG fields are undefined.
which is incorrect, because MPI_Wait/Test must not set the MPI_ERROR field.
Additionally, my first proposal does not say that in case of MPI_Wait/Test, the status is fully undefined.
Therefore, and to solve all these problems about inconsistencies and incompleteness, I propose:
Remove
Upon returning from a completion call in which a nonblocking collective operation completes, the MPI_ERROR field in the associated status object is set appropriately. The values of the MPI_SOURCE and MPI_TAG fields are undefined.
Before the paragraph "Multiple nonblocking collective...", the following paragraph should be added:
The fields in a status object returned by a call to MPI_WAIT, MPITEST, or MPI{TEST|WAIT}{ALL|SOME|ANY}, where the request corresponds to a nonblocking collective call, are undefined, with one exceptions: The error status field will contain valid information if the wait or test call returned with MPI_ERR_IN_STATUS. MPI_ERR_INSTATUS can be returned only by MPI{TEST|WAIT}{ALL|SOME}.
The text should not be earlier, because after the sentence on MPI_CANCEL, it is clear, that the second exception (compare MPI-2.2, page 53, lines 25-27) is not needed.
And please don't forget the lines
\mpifuncindex{MPI_WAITALL}%[[BR]] \mpifuncindex{MPI_WAITSOME}%[[BR]] \mpifuncindex{MPI_TESTALL}%[[BR]] \mpifuncindex{MPI_TESTSOME}%
Originally by htor on 2009-01-21 12:53:26 -0600
Author: Torsten Hoefler
Description
Nonblocking collective operations provide optimization potential and new semantic properties to the user.
History
This particular issue (and proposal) has been discussed several times at the MPI Forum. An official reading was performed at the December 2008 meeting.
Proposed Solution
MPI-2.1 Sections 5.1 (p129:19-42, p131:11-33) and Section 5.2.2 (p133:6-29, p133:41-p134:5) are modified according to the new text shown in the attached pdf. A new Section 5.12 "Nonblocking Collective Operations" is added. At the end of MPI-2.1 Sect. 5.12 (in the attached pdf, it is Sect. 5.13), new examples are added, starting with Example 5.28 in the attached pdf.
The following tickets discuss subitems of this ticket:
and are currently under review. They will be merged and closed (striked) as soon as the individual reviews are done.
Impact on Implementations
Minor, new operations have to be added. However, a full open-source implementation is available with LibNBC and possible implementation options are well documented. LibNBC is BSD licensed and could be included in any MPI library as a first step.
Impact on Applications / Users
None, new API functions become available for use.
Alternative Solutions
Split collectives (see JoD) have been discussed and are generally considered as too limited.
Entry for the Change Log
Added nonblocking interfaces to all collective operations