Closed mpichbot closed 7 years ago
Originally by blocksom on 2013-10-29 12:19:57 -0500
The commits have been pushed to the ticket-1957
branch in the mpich-ibm.git
repository.
http://git.mpich.org/mpich-ibm.git/shortlog/refs/heads/ticket-1957
Originally by blocksom on 2013-10-29 12:22:38 -0500
The following commits shoudl be reviewed before Thursday when Bob leaves the project.
% git log 208127f...4bb837e --format=short
commit 208127fa22b643f30bd4aa7ad5703197cbbbc23c
Author: Bob Cernohous <bobc@us.ibm.com>
Make comm destroy non-blocking and use pami callback to complete it
commit 70fa5e968390ba8799e48046313a89a03e576cd1
Author: Bob Cernohous <bobc@us.ibm.com>
Handle Win_lock/unlock on MPI_PROC_NULL
commit 49c395da94ffeba14413a0a4b698c8a36a4d2238
Author: Bob Cernohous <bobc@us.ibm.com>
Free the builtin communicators (world/self/iworld)
commit d70115e4fe275b78532f3fedb3f09bb3ebc2f262
Author: Bob Cernohous <bobc@us.ibm.com>
Test updates for ad_bglockless.
commit 8d84ea6ded94499be7b14348c2fc37f414ceb65e
Author: Bob Cernohous <bobc@us.ibm.com>
Better detect no shared file support (bglockless)
Originally by blocksom on 2013-11-22 10:07:06 -0600
I rebased the mpich-ibm/ticket-1957
branch to the latest mpich/master
.
There are 10 "bug fix" commits that need to be applied.
$ git log origin/master..mpich-ibm/ticket-1957 --oneline
e5c9622 Fixes for MPI_Alltoall when using non-contig data and collective selection
077d3b2 Fixes for comm/geometry destroy handling of context_id
01bcf78 Fixes for Collective Selection enablement and MPIDO_Gather
b8c2fcf Debug lib memory tracing signature
df77559 Task flooding causes poor MPI_Reduce performance
a2bd3e3 Add SMP aware support in MPIR_Barrier
4e9180b Make comm destroy non-blocking and use pami callback to complete it
f47d64e Handle Win_lock/unlock on MPI_PROC_NULL
48c942a Free the builtin communicators (world/self/iworld)
5318835 Test updates for ad_bglockless.
These commits are required for mpich 3.1.1 to be included in an upcoming IBM release.
Originally by blocksom on 2013-12-05 09:50:03 -0600
I split the mpich-ibm/ticket-1957
branch in two:
mpich-ibm/ticket-1957-only-pamid mpich-ibm/ticket-1957-need-review
Originally by blocksom on 2013-12-05 10:37:26 -0600
I pushed the "only pamid" code to the mpich/master
branch.
What remains is to code review the handful of commits on the ticket-1957-need-review branch.
Originally by blocksom on 2013-12-17 13:25:23 -0600
I rebased the "needs review" branch to mpich/master
and pushed it to mpich-ibm/ticket-1957-need-review-2
. There are 5 commits that need attention:
$ git log --oneline --stat -n5
1e83bb4 Fixes for comm/geometry destroy handling of context_id
src/mpid/pamid/src/comm/mpid_comm.c | 19 ++++++++++++++++++-
1 files changed, 18 insertions(+), 1 deletions(-)
51335c4 Debug lib memory tracing signature
src/include/mpimem.h | 2 +-
src/mpl/include/mpltrmem.h | 2 +-
src/mpl/src/mpltrmem.c | 22 +++++++++++++++++-----
src/util/mem/trmem.c | 2 +-
4 files changed, 20 insertions(+), 8 deletions(-)
e997fab Add SMP aware support in MPIR_Barrier
src/mpi/coll/barrier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 49 insertions(+), 4 deletions(-)
0e3ca48 Make comm destroy non-blocking and use pami callback to complete it
src/include/mpiimpl.h | 2 +-
src/mpi/comm/commutil.c | 7 ++-
src/mpid/ch3/include/mpidpre.h | 2 +-
src/mpid/pamid/include/mpidi_hooks.h | 2 +-
src/mpid/pamid/include/mpidi_prototypes.h | 6 +--
src/mpid/pamid/src/comm/mpid_comm.c | 77 ++++++++++++++++++++---------
6 files changed, 62 insertions(+), 34 deletions(-)
157b0ef Test updates for ad_bglockless.
test/mpi/cxx/util/mtest.cxx | 5 +++++
test/mpi/f77/io/shpositionf.f | 6 ++++++
test/mpi/io/rdwrord.c | 9 ++++++---
3 files changed, 17 insertions(+), 3 deletions(-)
I also deleted the mpich-ibm/ticket-1957-only-pamid
branch as it has already been pushed to mpich/master
.
The mpich-ibm/ticket-1957-need-review
branch could be deleted as well - but I left it alive for now.
Originally by robl on 2013-12-18 12:38:07 -0600
Replying to blocksom:
I rebased the "needs review" branch to
mpich/master
and pushed it tompich-ibm/ticket-1957-need-review-2
. There are 5 commits that need attention:$ git log --oneline --stat -n5 1e83bb4 Fixes for comm/geometry destroy handling of context_id src/mpid/pamid/src/comm/mpid_comm.c | 19 ++++++++++++++++++- 1 files changed, 18 insertions(+), 1 deletions(-)
I'm pretty sure this one is also related to #1947. It's pami-only, so doesn't this just go into master?
51335c4 Debug lib memory tracing signature src/include/mpimem.h | 2 +- src/mpl/include/mpltrmem.h | 2 +- src/mpl/src/mpltrmem.c | 22 +++++++++++++++++----- src/util/mem/trmem.c | 2 +- 4 files changed, 20 insertions(+), 8 deletions(-)
This is not just a signature change. Also has some changes based on MPICH_TRMEM_INIT environment variable checks.
e997fab Add SMP aware support in MPIR_Barrier src/mpi/coll/barrier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 49 insertions(+), 4 deletions(-)
one of the messaging folks should check this out
0e3ca48 Make comm destroy non-blocking and use pami callback to complete it src/include/mpiimpl.h | 2 +- src/mpi/comm/commutil.c | 7 ++- src/mpid/ch3/include/mpidpre.h | 2 +- src/mpid/pamid/include/mpidi_hooks.h | 2 +- src/mpid/pamid/include/mpidi_prototypes.h | 6 +-- src/mpid/pamid/src/comm/mpid_comm.c | 77 ++++++++++++++++++++--------- 6 files changed, 62 insertions(+), 34 deletions(-)
out of my depth here... did look at this as part of #1947
157b0ef Test updates for ad_bglockless. test/mpi/cxx/util/mtest.cxx | 5 +++++ test/mpi/f77/io/shpositionf.f | 6 ++++++ test/mpi/io/rdwrord.c | 9 ++++++--- 3 files changed, 17 insertions(+), 3 deletions(-)
anything that makes the tests report errors more aggressively gets my sign-off. I'll go do that and push at least this one right now.
Originally by blocksom on 2013-12-18 12:42:18 -0600
Replying to robl:
Replying to blocksom:
I rebased the "needs review" branch to
mpich/master
and pushed it tompich-ibm/ticket-1957-need-review-2
. There are 5 commits that need attention:$ git log --oneline --stat -n5 1e83bb4 Fixes for comm/geometry destroy handling of context_id src/mpid/pamid/src/comm/mpid_comm.c | 19 ++++++++++++++++++- 1 files changed, 18 insertions(+), 1 deletions(-)
I'm pretty sure this one is also related to #1947. It's pami-only, so doesn't this just go into master? At one point I think I was getting a conflict if I cherry-picked this guy individually on top of master. Maybe that's changed recently - I'll take another look.
Originally by robl on 2013-12-18 12:56:27 -0600
Regarding
157b0efe Test updates for ad_bglockless.
test/mpi/cxx/util/mtest.cxx | 5 +++++
test/mpi/f77/io/shpositionf.f | 6 ++++++
test/mpi/io/rdwrord.c | 9 ++++++---
3 files changed, 17 insertions(+), 3 deletions(-)
Why not just set the error handler to always throw exceptions, instead of guarding it with an environment variable?
envval = getenv( "MPITEST_FILE_ERRORS_THROW_EXCEPTIONS" );
if (envval && *envval) {
MPI::FILE_NULL.Set_errhandler(MPI::ERRORS_THROW_EXCEPTIONS);
}
If there's even a made up hand wavy explanation for not throwing exceptions, i'll sign off and push the patch as-is, else, i'm going to strike that environment variable check and always call Set_errhandler
Originally by blocksom on 2013-12-18 13:04:28 -0600
Replying to robl:
Regarding
157b0efe Test updates for ad_bglockless. test/mpi/cxx/util/mtest.cxx | 5 +++++ test/mpi/f77/io/shpositionf.f | 6 ++++++ test/mpi/io/rdwrord.c | 9 ++++++--- 3 files changed, 17 insertions(+), 3 deletions(-)
Why not just set the error handler to always throw exceptions, instead of guarding it with an environment variable? No idea .. we probably did it this way so the default behavior of the tests didn't change. I'll defer to the aggregate wisdom of the mpich team on this one.
Originally by robl on 2013-12-18 14:47:16 -0600
not sure about the aggregate wisdom, but I pushed the "test updates" (and just that one) change to master.
Originally by balaji on 2013-12-29 14:07:12 -0600
Can someone give a bit of context on this patch?
commit efa8beef473cdc95a61725b9159e9ff4e89053ce
Author: Bob Cernohous <bobc@us.ibm.com>
Date: Thu Jul 25 14:23:39 2013 -0500
Make comm destroy non-blocking and use pami callback to complete it
(ibm) D192134: Collective offload hang
(ibm) 60fe1832205e8374d64cc1ea5634b085866d8042
(ibm) 49ff4ed153f9fd45e2649403d5aac874ce02a5ed
(ibm) Issue 9645
(ibm) 25693b4b116a21f1ab1c0f782d0f63205a2e4c4b
(ibm) 10c915b9389f5697fe3afac7a59411632e5c1bdf
Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>
The patch itself seems reasonable, but I'm trying to understand why it's needed.
Originally by balaji on 2013-12-29 14:11:59 -0600
I'm looking for some context on the below patch as well:
commit 2cd0195c87d1fdfbb342a2fd2787edcf79c9cb04
Author: Sameh Sharkawi <sssharka@us.ibm.com>
Date: Thu May 16 11:45:23 2013 -0400
Add SMP aware support in MPIR_Barrier
(ibm) cc6833c08ab79638baabb87a0ab4e6c7681c5b9e
Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>
MPI_Barrier already has SMP support (see the MPIR_Barrier_impl). Is there a reason why MPIR_Barrier needs to be SMP-aware as well? Maybe this is a left-over commit from before we added SMP collectives that never got purged?
Originally by blocksom on 2014-01-02 12:00:39 -0600
Bob Cernohous is now at Cray .. I can't seem to find his Cray email address, so I've CC'd his personal email instead.
Bob, feel free to un-CC yourself if you want. :)
Originally by sssharka on 2014-01-02 13:14:47 -0600
Regarding "Make comm destroy non-blocking and use pami callback to complete it", this commit was needed to make comm_destroy in pamid non-blocking. Previously, comm_destroy in pamid would ask pami to start geometry destroy and block waiting for geometry destroy to finish essencially making comm_destroy a blocking call. However, comm_destroy needed to be non-blocking so we made the change where comm_destory would tell pami to start the geometry destroy, probe pami for a bit and if geometry destroy is not done yet it would return. Since comm_destroy may return while pami geometry destroy is still in progress, the context_id associated with the previous comm may still be in use by geometry destroy. Hence, we only release the context_id when geometry destroy in pami is completely done. The changes were to make sure context_id is not released by comm_destroy but released only when geometry destroy callback is triggered.
As for the "Add SMP aware..." commit, MPIR_Barrier_impl is SMP aware; however, MPIR_Barrier is not. So any device call to barrier will not benefit from SMP aware implementation of barrier.
Thanks Sameh
Originally by balaji on 2014-01-02 13:27:16 -0600
Why does comm_destroy need to be nonblocking?
For the SMP-aware commit, yes, you are right. The SMP aware implementation should have been in MPIR_Barrier_intra, rather than in MPIR_Barrier_impl. I'll fix this a different way, so as to not duplicate the SMP-aware code.
Originally by blocksom on 2014-01-02 13:41:29 -0600
We actually ran into this nonblocking comm destroy problem simultaneously on the BGQ and PE platforms - weird enough. Both platforms use specialized hardware for collectives and need to free those local resources (different for each) only when all ranks have completed globally.
Originally by balaji on 2014-01-02 13:43:11 -0600
How is making the call nonblocking helping with that? You can block in Comm_free, isn't it?
Originally by Pavan Balaji balaji@mcs.anl.gov on 2014-01-07 10:09:11 -0600
In 4f97779716a487b66b301e16c2cb9b75dccdff8e: Make SMP-awareness in MPI_Barrier consistent.
The current implementation of SMP-awareness in MPI_Barrier was in MPIR_Barrier_impl. This makes the default implementation of barrier SMP-aware. However, if a device overrides barrier and then calls back the default implementation through MPIR_Barrier, it can no longer take advantage of SMP-awareness. This patch moves the SMP-aware implementation to MPIR_Barrier_intra, which is called by MPIR_Barrier_impl through MPIR_Barrier, in the default implementation. See #1957.
Signed-off-by: Michael Blocksome blocksom@us.ibm.com
Originally by balaji on 2014-01-07 12:14:15 -0600
Based on our offline discussion on this ticket, the above commit should fix the MPI_Barrier part.
Now this ticket only deals with the MPI_COMM_FREE piece, for which IBM is recommending to have an internal nonblocking implementation. The current status is that Mike is trying to dig up the test program that was failing without this.
Originally by blocksom on 2014-01-07 13:56:40 -0600
Should we pluck this lone remaining commit into another ticket, and branch, so we don't get confused with what has already been resolved?
Originally by balaji on 2014-01-07 14:00:36 -0600
The branch currently contains three commits. One of them is the old MPI_Barrier fix, which can be deleted. For the remaining two, I think we can either keep the same branch or create a new branch. I believe both of the remaining commits deal with the COMM_FREE issue.
Originally by blocksom on 2014-02-18 15:37:00 -0600
I rebased the code to the v3.1rc4
tag, dropped the old MPI_Barrier fix, and deleted the old branches.
Originally by blocksom on 2014-03-12 10:29:31 -0500
The remaining two commits deal with allowing the device to make the internal MPID_Dev_comm_destroy_hook
operation non-blocking.
$ git log -n2 mpich-ibm/ticket-1957-v3.1rc4
commit eb719d6982150a1bad3422fb8e21322d0c0569a0
Author: Sameh Sharkawi <sssharka@us.ibm.com>
Date: Thu Oct 10 14:04:20 2013 -0400
Fixes for comm/geometry destroy handling of context_id
Context_ids for node_comm and roots_node_comm share the same context mask bits
as the parent comm/geometry. Releasing one of them at geometry destroy releases
all of them causing erroneous reuse of context_ids
(ibm) D192398: Hang in shift2 testcase when FCA is enabled
(ibm) e0ace40df9e8571b990153d6ef19c6a7c0fe9c96
Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>
commit 6db8183528d5eeacfffdd28628508b8d1458af16
Author: Bob Cernohous <bobc@us.ibm.com>
Date: Thu Jul 25 14:23:39 2013 -0500
Make comm destroy non-blocking and use pami callback to complete it
(ibm) D192134: Collective offload hang
(ibm) 60fe1832205e8374d64cc1ea5634b085866d8042
(ibm) 49ff4ed153f9fd45e2649403d5aac874ce02a5ed
(ibm) Issue 9645
(ibm) 25693b4b116a21f1ab1c0f782d0f63205a2e4c4b
(ibm) 10c915b9389f5697fe3afac7a59411632e5c1bdf
Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>
This is necessary for PE build of mpich:pamid otherwise the test/mpi/comm/cmfree.c
test will hang. This hang was recently verified using the latest master branch, and then after these commits are applied the test will pass. It appears that the cmfree.c
test was written to assume that the MPI_Comm_free
operation is a local, non-synchronizing, operation - as suggested by the MPI standard.
In the advice to implementors section in the MPI_COMM_FREE documentation of the MPI standard (copied below) the MPI_Comm_free is expected to be local from the point of view of the application.
A reference-count mechanism may be used: the reference count is incremented by
each call to MPI_COMM_DUP or MPI_COMM_IDUP, and decremented by each call to
MPI_COMM_FREE. The object is ultimately deallocated when the count reaches zero.
Though collective, it is anticipated that this operation will normally be
implemented to be local, though a debugging version of an MPI library might
choose to synchronize.
When using PAMI device, a strictly local comm free (geometry destroy) would be an invalid/erroneous operation because global resource information and usage needs to be relayed to all ranks participating in the free operation. This is necessary to allocate and deallocate certain network resources that are shared by a group of nodes. These network resources are used to provide collectives that are optimized to the specific hardware.
In order to follow the standard and implement the comm free as a local operation, and without having to sacrifice correctness in PAMI device, we chose to implement comm_free as non-blocking operation. Thus, it appears to the application as a local operation yet is actually a collective operation to the PAMI device as required.
We have made every attempt to minimize the impact of this code change to other platforms and devices, such as the ch3 device. After applying these commits the ch3 code should continue to operate exactly as before.
Please review this code and consider pushing it to the mpich/master
branch to be included in the mpich 3.1.1 release.
Originally by balaji on 2014-06-28 20:57:56 -0500
Ah, OK now I understand the reason for the change. Thanks.
I looked through the cmfree.c
test and you are right -- it is incorrectly assuming that MPI_Comm_free
is noncollective. The test needs to be fixed to not assume that. Once we fix that, technically, you'll not need these changes anymore and the function can be made blocking internally.
The only catch, however, is the potential changes in MPI-4, particularly those related to FT. I believe the FT working group is adding something that requires the noncollective nature of MPI_Comm_free
. Wesley or Ken might need to correct me here if I misunderstood something. I don't want to throw away changes in pamid that might be useful when MPI-4 comes out.
Originally by blocksom on 2014-07-07 10:13:50 -0500
Replying to balaji:
I looked through the
cmfree.c
test and you are right -- it is incorrectly assuming thatMPI_Comm_free
is noncollective. The test needs to be fixed to not assume that. Once we fix that, technically, you'll not need these changes anymore and the function can be made blocking internally. This sounds good, although I'm not sure if there are other tests or applications that might make a similarly incorrect assumption. The only catch, however, is the potential changes in MPI-4, particularly those related to FT. I believe the FT working group is adding something that requires the noncollective nature ofMPI_Comm_free
. Wesley or Ken might need to correct me here if I misunderstood something. I don't want to throw away changes in pamid that might be useful when MPI-4 comes out. I don't quite understand the "throwing away" bit. Are you saying that once thecmfree.c
test is fixed then the code being discussed is not needed (and can be discarded)? .. yet, because of MPI-4 FT, we may still want this code?
The first commit,
commit 6db8183528d5eeacfffdd28628508b8d1458af16
Author: Bob Cernohous <bobc@us.ibm.com>
Date: Thu Jul 25 14:23:39 2013 -0500
Make comm destroy non-blocking and use pami callback to complete it
is not very intrusive to ch3. The code for ch3 will continue to have a blocking behavior. Why should this commit not be included? When MPI-4 FT comes around the ch3 code could quickly change to the non-blocking behavior (if needed).
The second commit,
commit eb719d6982150a1bad3422fb8e21322d0c0569a0
Author: Sameh Sharkawi <sssharka@us.ibm.com>
Date: Thu Oct 10 14:04:20 2013 -0400
Fixes for comm/geometry destroy handling of context_id
only changes pamid code. This commit fixes a hang in the shift2
test - which I believe is an internal test case but need to verify. I suppose it is possible that this test makes the same incorrect assumption as well.
Originally by blocksom on 2013-10-29 12:17:40 -0500
There are several bug fixes that were made on bgq and pe release branches. These commits need to be migrated into the master branch.