pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
535 stars 280 forks source link

Migrate bgq and pe commits to master #1957

Closed mpichbot closed 7 years ago

mpichbot commented 7 years ago

Originally by blocksom on 2013-10-29 12:17:40 -0500


There are several bug fixes that were made on bgq and pe release branches. These commits need to be migrated into the master branch.

mpichbot commented 7 years ago

Originally by blocksom on 2013-10-29 12:19:57 -0500


The commits have been pushed to the ticket-1957 branch in the mpich-ibm.git repository.

http://git.mpich.org/mpich-ibm.git/shortlog/refs/heads/ticket-1957

mpichbot commented 7 years ago

Originally by blocksom on 2013-10-29 12:22:38 -0500


The following commits shoudl be reviewed before Thursday when Bob leaves the project.

% git log 208127f...4bb837e --format=short
commit 208127fa22b643f30bd4aa7ad5703197cbbbc23c
Author: Bob Cernohous <bobc@us.ibm.com>

    Make comm destroy non-blocking and use pami callback to complete it

commit 70fa5e968390ba8799e48046313a89a03e576cd1
Author: Bob Cernohous <bobc@us.ibm.com>

    Handle Win_lock/unlock on MPI_PROC_NULL

commit 49c395da94ffeba14413a0a4b698c8a36a4d2238
Author: Bob Cernohous <bobc@us.ibm.com>

    Free the builtin communicators (world/self/iworld)

commit d70115e4fe275b78532f3fedb3f09bb3ebc2f262
Author: Bob Cernohous <bobc@us.ibm.com>

    Test updates for ad_bglockless.

commit 8d84ea6ded94499be7b14348c2fc37f414ceb65e
Author: Bob Cernohous <bobc@us.ibm.com>

    Better detect no shared file support (bglockless)
mpichbot commented 7 years ago

Originally by blocksom on 2013-11-22 10:07:06 -0600


I rebased the mpich-ibm/ticket-1957 branch to the latest mpich/master.

There are 10 "bug fix" commits that need to be applied.

$ git log origin/master..mpich-ibm/ticket-1957 --oneline
e5c9622 Fixes for MPI_Alltoall when using non-contig data and collective selection
077d3b2 Fixes for comm/geometry destroy handling of context_id
01bcf78 Fixes for Collective Selection enablement and MPIDO_Gather
b8c2fcf Debug lib memory tracing signature
df77559 Task flooding causes poor MPI_Reduce performance
a2bd3e3 Add SMP aware support in MPIR_Barrier
4e9180b Make comm destroy non-blocking and use pami callback to complete it
f47d64e Handle Win_lock/unlock on MPI_PROC_NULL
48c942a Free the builtin communicators (world/self/iworld)
5318835 Test updates for ad_bglockless.

These commits are required for mpich 3.1.1 to be included in an upcoming IBM release.

mpichbot commented 7 years ago

Originally by blocksom on 2013-12-05 09:50:03 -0600


I split the mpich-ibm/ticket-1957 branch in two:

mpich-ibm/ticket-1957-only-pamid mpich-ibm/ticket-1957-need-review

mpichbot commented 7 years ago

Originally by blocksom on 2013-12-05 10:37:26 -0600


I pushed the "only pamid" code to the mpich/master branch.

What remains is to code review the handful of commits on the ticket-1957-need-review branch.

mpichbot commented 7 years ago

Originally by blocksom on 2013-12-17 13:25:23 -0600


I rebased the "needs review" branch to mpich/master and pushed it to mpich-ibm/ticket-1957-need-review-2. There are 5 commits that need attention:

$ git log --oneline --stat -n5
1e83bb4 Fixes for comm/geometry destroy handling of context_id
 src/mpid/pamid/src/comm/mpid_comm.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)
51335c4 Debug lib memory tracing signature
 src/include/mpimem.h       |    2 +-
 src/mpl/include/mpltrmem.h |    2 +-
 src/mpl/src/mpltrmem.c     |   22 +++++++++++++++++-----
 src/util/mem/trmem.c       |    2 +-
 4 files changed, 20 insertions(+), 8 deletions(-)
e997fab Add SMP aware support in MPIR_Barrier
 src/mpi/coll/barrier.c |   53 ++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 49 insertions(+), 4 deletions(-)
0e3ca48 Make comm destroy non-blocking and use pami callback to complete it
 src/include/mpiimpl.h                     |    2 +-
 src/mpi/comm/commutil.c                   |    7 ++-
 src/mpid/ch3/include/mpidpre.h            |    2 +-
 src/mpid/pamid/include/mpidi_hooks.h      |    2 +-
 src/mpid/pamid/include/mpidi_prototypes.h |    6 +--
 src/mpid/pamid/src/comm/mpid_comm.c       |   77 ++++++++++++++++++++---------
 6 files changed, 62 insertions(+), 34 deletions(-)
157b0ef Test updates for ad_bglockless.
 test/mpi/cxx/util/mtest.cxx   |    5 +++++
 test/mpi/f77/io/shpositionf.f |    6 ++++++
 test/mpi/io/rdwrord.c         |    9 ++++++---
 3 files changed, 17 insertions(+), 3 deletions(-)

I also deleted the mpich-ibm/ticket-1957-only-pamid branch as it has already been pushed to mpich/master.

The mpich-ibm/ticket-1957-need-review branch could be deleted as well - but I left it alive for now.

mpichbot commented 7 years ago

Originally by robl on 2013-12-18 12:38:07 -0600


Replying to blocksom:

I rebased the "needs review" branch to mpich/master and pushed it to mpich-ibm/ticket-1957-need-review-2. There are 5 commits that need attention:

$ git log --oneline --stat -n5
1e83bb4 Fixes for comm/geometry destroy handling of context_id
 src/mpid/pamid/src/comm/mpid_comm.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

I'm pretty sure this one is also related to #1947. It's pami-only, so doesn't this just go into master?

 51335c4 Debug lib memory tracing signature
 src/include/mpimem.h       |    2 +-
 src/mpl/include/mpltrmem.h |    2 +-
 src/mpl/src/mpltrmem.c     |   22 +++++++++++++++++-----
 src/util/mem/trmem.c       |    2 +-
 4 files changed, 20 insertions(+), 8 deletions(-)

This is not just a signature change. Also has some changes based on MPICH_TRMEM_INIT environment variable checks.

e997fab Add SMP aware support in MPIR_Barrier
 src/mpi/coll/barrier.c |   53 ++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 49 insertions(+), 4 deletions(-)

one of the messaging folks should check this out

0e3ca48 Make comm destroy non-blocking and use pami callback to complete it
 src/include/mpiimpl.h                     |    2 +-
 src/mpi/comm/commutil.c                   |    7 ++-
 src/mpid/ch3/include/mpidpre.h            |    2 +-
 src/mpid/pamid/include/mpidi_hooks.h      |    2 +-
 src/mpid/pamid/include/mpidi_prototypes.h |    6 +--
 src/mpid/pamid/src/comm/mpid_comm.c       |   77 ++++++++++++++++++++---------
 6 files changed, 62 insertions(+), 34 deletions(-)

out of my depth here... did look at this as part of #1947

157b0ef Test updates for ad_bglockless.
 test/mpi/cxx/util/mtest.cxx   |    5 +++++
 test/mpi/f77/io/shpositionf.f |    6 ++++++
 test/mpi/io/rdwrord.c         |    9 ++++++---
 3 files changed, 17 insertions(+), 3 deletions(-)

anything that makes the tests report errors more aggressively gets my sign-off. I'll go do that and push at least this one right now.

mpichbot commented 7 years ago

Originally by blocksom on 2013-12-18 12:42:18 -0600


Replying to robl:

Replying to blocksom:

I rebased the "needs review" branch to mpich/master and pushed it to mpich-ibm/ticket-1957-need-review-2. There are 5 commits that need attention:

$ git log --oneline --stat -n5
1e83bb4 Fixes for comm/geometry destroy handling of context_id
 src/mpid/pamid/src/comm/mpid_comm.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

I'm pretty sure this one is also related to #1947. It's pami-only, so doesn't this just go into master? At one point I think I was getting a conflict if I cherry-picked this guy individually on top of master. Maybe that's changed recently - I'll take another look.

mpichbot commented 7 years ago

Originally by robl on 2013-12-18 12:56:27 -0600


Regarding

157b0efe Test updates for ad_bglockless.
 test/mpi/cxx/util/mtest.cxx   | 5 +++++
 test/mpi/f77/io/shpositionf.f | 6 ++++++
 test/mpi/io/rdwrord.c         | 9 ++++++---
 3 files changed, 17 insertions(+), 3 deletions(-)

Why not just set the error handler to always throw exceptions, instead of guarding it with an environment variable?

    envval = getenv( "MPITEST_FILE_ERRORS_THROW_EXCEPTIONS" );
    if (envval && *envval) {
        MPI::FILE_NULL.Set_errhandler(MPI::ERRORS_THROW_EXCEPTIONS);
    }

If there's even a made up hand wavy explanation for not throwing exceptions, i'll sign off and push the patch as-is, else, i'm going to strike that environment variable check and always call Set_errhandler

mpichbot commented 7 years ago

Originally by blocksom on 2013-12-18 13:04:28 -0600


Replying to robl:

Regarding

157b0efe Test updates for ad_bglockless.
 test/mpi/cxx/util/mtest.cxx   | 5 +++++
 test/mpi/f77/io/shpositionf.f | 6 ++++++
 test/mpi/io/rdwrord.c         | 9 ++++++---
 3 files changed, 17 insertions(+), 3 deletions(-)

Why not just set the error handler to always throw exceptions, instead of guarding it with an environment variable? No idea .. we probably did it this way so the default behavior of the tests didn't change. I'll defer to the aggregate wisdom of the mpich team on this one.

mpichbot commented 7 years ago

Originally by robl on 2013-12-18 14:47:16 -0600


not sure about the aggregate wisdom, but I pushed the "test updates" (and just that one) change to master.

mpichbot commented 7 years ago

Originally by balaji on 2013-12-29 14:07:12 -0600


Can someone give a bit of context on this patch?

commit efa8beef473cdc95a61725b9159e9ff4e89053ce
Author: Bob Cernohous <bobc@us.ibm.com>
Date:   Thu Jul 25 14:23:39 2013 -0500

    Make comm destroy non-blocking and use pami callback to complete it

    (ibm) D192134: Collective offload hang
    (ibm) 60fe1832205e8374d64cc1ea5634b085866d8042
    (ibm) 49ff4ed153f9fd45e2649403d5aac874ce02a5ed

    (ibm) Issue 9645
    (ibm) 25693b4b116a21f1ab1c0f782d0f63205a2e4c4b
    (ibm) 10c915b9389f5697fe3afac7a59411632e5c1bdf

    Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

The patch itself seems reasonable, but I'm trying to understand why it's needed.

mpichbot commented 7 years ago

Originally by balaji on 2013-12-29 14:11:59 -0600


I'm looking for some context on the below patch as well:

commit 2cd0195c87d1fdfbb342a2fd2787edcf79c9cb04
Author: Sameh Sharkawi <sssharka@us.ibm.com>
Date:   Thu May 16 11:45:23 2013 -0400

    Add SMP aware support in MPIR_Barrier

    (ibm) cc6833c08ab79638baabb87a0ab4e6c7681c5b9e

    Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

MPI_Barrier already has SMP support (see the MPIR_Barrier_impl). Is there a reason why MPIR_Barrier needs to be SMP-aware as well? Maybe this is a left-over commit from before we added SMP collectives that never got purged?

mpichbot commented 7 years ago

Originally by blocksom on 2014-01-02 12:00:39 -0600


Bob Cernohous is now at Cray .. I can't seem to find his Cray email address, so I've CC'd his personal email instead.

Bob, feel free to un-CC yourself if you want. :)

mpichbot commented 7 years ago

Originally by sssharka on 2014-01-02 13:14:47 -0600


Regarding "Make comm destroy non-blocking and use pami callback to complete it", this commit was needed to make comm_destroy in pamid non-blocking. Previously, comm_destroy in pamid would ask pami to start geometry destroy and block waiting for geometry destroy to finish essencially making comm_destroy a blocking call. However, comm_destroy needed to be non-blocking so we made the change where comm_destory would tell pami to start the geometry destroy, probe pami for a bit and if geometry destroy is not done yet it would return. Since comm_destroy may return while pami geometry destroy is still in progress, the context_id associated with the previous comm may still be in use by geometry destroy. Hence, we only release the context_id when geometry destroy in pami is completely done. The changes were to make sure context_id is not released by comm_destroy but released only when geometry destroy callback is triggered.

As for the "Add SMP aware..." commit, MPIR_Barrier_impl is SMP aware; however, MPIR_Barrier is not. So any device call to barrier will not benefit from SMP aware implementation of barrier.

Thanks Sameh

mpichbot commented 7 years ago

Originally by balaji on 2014-01-02 13:27:16 -0600


Why does comm_destroy need to be nonblocking?

For the SMP-aware commit, yes, you are right. The SMP aware implementation should have been in MPIR_Barrier_intra, rather than in MPIR_Barrier_impl. I'll fix this a different way, so as to not duplicate the SMP-aware code.

mpichbot commented 7 years ago

Originally by blocksom on 2014-01-02 13:41:29 -0600


We actually ran into this nonblocking comm destroy problem simultaneously on the BGQ and PE platforms - weird enough. Both platforms use specialized hardware for collectives and need to free those local resources (different for each) only when all ranks have completed globally.

mpichbot commented 7 years ago

Originally by balaji on 2014-01-02 13:43:11 -0600


How is making the call nonblocking helping with that? You can block in Comm_free, isn't it?

mpichbot commented 7 years ago

Originally by Pavan Balaji balaji@mcs.anl.gov on 2014-01-07 10:09:11 -0600


In 4f97779716a487b66b301e16c2cb9b75dccdff8e: Make SMP-awareness in MPI_Barrier consistent.

The current implementation of SMP-awareness in MPI_Barrier was in MPIR_Barrier_impl. This makes the default implementation of barrier SMP-aware. However, if a device overrides barrier and then calls back the default implementation through MPIR_Barrier, it can no longer take advantage of SMP-awareness. This patch moves the SMP-aware implementation to MPIR_Barrier_intra, which is called by MPIR_Barrier_impl through MPIR_Barrier, in the default implementation. See #1957.

Signed-off-by: Michael Blocksome blocksom@us.ibm.com

mpichbot commented 7 years ago

Originally by balaji on 2014-01-07 12:14:15 -0600


Based on our offline discussion on this ticket, the above commit should fix the MPI_Barrier part.

Now this ticket only deals with the MPI_COMM_FREE piece, for which IBM is recommending to have an internal nonblocking implementation. The current status is that Mike is trying to dig up the test program that was failing without this.

mpichbot commented 7 years ago

Originally by blocksom on 2014-01-07 13:56:40 -0600


Should we pluck this lone remaining commit into another ticket, and branch, so we don't get confused with what has already been resolved?

mpichbot commented 7 years ago

Originally by balaji on 2014-01-07 14:00:36 -0600


The branch currently contains three commits. One of them is the old MPI_Barrier fix, which can be deleted. For the remaining two, I think we can either keep the same branch or create a new branch. I believe both of the remaining commits deal with the COMM_FREE issue.

mpichbot commented 7 years ago

Originally by blocksom on 2014-02-18 15:37:00 -0600


I rebased the code to the v3.1rc4 tag, dropped the old MPI_Barrier fix, and deleted the old branches.

See mpich-ibm/ticket-1957-v3.1rc4.

mpichbot commented 7 years ago

Originally by blocksom on 2014-03-12 10:29:31 -0500


The remaining two commits deal with allowing the device to make the internal MPID_Dev_comm_destroy_hook operation non-blocking.

$ git log -n2 mpich-ibm/ticket-1957-v3.1rc4
commit eb719d6982150a1bad3422fb8e21322d0c0569a0
Author: Sameh Sharkawi <sssharka@us.ibm.com>
Date:   Thu Oct 10 14:04:20 2013 -0400

    Fixes for comm/geometry destroy handling of context_id

    Context_ids for node_comm and roots_node_comm share the same context mask bits
    as the parent comm/geometry. Releasing one of them at geometry destroy releases
    all of them causing erroneous reuse of context_ids

    (ibm) D192398: Hang in shift2 testcase when FCA is enabled
    (ibm) e0ace40df9e8571b990153d6ef19c6a7c0fe9c96

    Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

commit 6db8183528d5eeacfffdd28628508b8d1458af16
Author: Bob Cernohous <bobc@us.ibm.com>
Date:   Thu Jul 25 14:23:39 2013 -0500

    Make comm destroy non-blocking and use pami callback to complete it

    (ibm) D192134: Collective offload hang
    (ibm) 60fe1832205e8374d64cc1ea5634b085866d8042
    (ibm) 49ff4ed153f9fd45e2649403d5aac874ce02a5ed

    (ibm) Issue 9645
    (ibm) 25693b4b116a21f1ab1c0f782d0f63205a2e4c4b
    (ibm) 10c915b9389f5697fe3afac7a59411632e5c1bdf

    Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

This is necessary for PE build of mpich:pamid otherwise the test/mpi/comm/cmfree.c test will hang. This hang was recently verified using the latest master branch, and then after these commits are applied the test will pass. It appears that the cmfree.c test was written to assume that the MPI_Comm_free operation is a local, non-synchronizing, operation - as suggested by the MPI standard.

In the advice to implementors section in the MPI_COMM_FREE documentation of the MPI standard (copied below) the MPI_Comm_free is expected to be local from the point of view of the application.

A reference-count mechanism may be used: the reference count is incremented by
each call to MPI_COMM_DUP or MPI_COMM_IDUP, and decremented by each call to
MPI_COMM_FREE. The object is ultimately deallocated when the count reaches zero.

Though collective, it is anticipated that this operation will normally be
implemented to be local, though a debugging version of an MPI library might
choose to synchronize.

When using PAMI device, a strictly local comm free (geometry destroy) would be an invalid/erroneous operation because global resource information and usage needs to be relayed to all ranks participating in the free operation. This is necessary to allocate and deallocate certain network resources that are shared by a group of nodes. These network resources are used to provide collectives that are optimized to the specific hardware.

In order to follow the standard and implement the comm free as a local operation, and without having to sacrifice correctness in PAMI device, we chose to implement comm_free as non-blocking operation. Thus, it appears to the application as a local operation yet is actually a collective operation to the PAMI device as required.

We have made every attempt to minimize the impact of this code change to other platforms and devices, such as the ch3 device. After applying these commits the ch3 code should continue to operate exactly as before.

Please review this code and consider pushing it to the mpich/master branch to be included in the mpich 3.1.1 release.

mpichbot commented 7 years ago

Originally by balaji on 2014-06-28 20:57:56 -0500


Ah, OK now I understand the reason for the change. Thanks.

I looked through the cmfree.c test and you are right -- it is incorrectly assuming that MPI_Comm_free is noncollective. The test needs to be fixed to not assume that. Once we fix that, technically, you'll not need these changes anymore and the function can be made blocking internally.

The only catch, however, is the potential changes in MPI-4, particularly those related to FT. I believe the FT working group is adding something that requires the noncollective nature of MPI_Comm_free. Wesley or Ken might need to correct me here if I misunderstood something. I don't want to throw away changes in pamid that might be useful when MPI-4 comes out.

mpichbot commented 7 years ago

Originally by blocksom on 2014-07-07 10:13:50 -0500


Replying to balaji:

I looked through the cmfree.c test and you are right -- it is incorrectly assuming that MPI_Comm_free is noncollective. The test needs to be fixed to not assume that. Once we fix that, technically, you'll not need these changes anymore and the function can be made blocking internally. This sounds good, although I'm not sure if there are other tests or applications that might make a similarly incorrect assumption. The only catch, however, is the potential changes in MPI-4, particularly those related to FT. I believe the FT working group is adding something that requires the noncollective nature of MPI_Comm_free. Wesley or Ken might need to correct me here if I misunderstood something. I don't want to throw away changes in pamid that might be useful when MPI-4 comes out. I don't quite understand the "throwing away" bit. Are you saying that once the cmfree.c test is fixed then the code being discussed is not needed (and can be discarded)? .. yet, because of MPI-4 FT, we may still want this code?

The first commit,

commit 6db8183528d5eeacfffdd28628508b8d1458af16
Author: Bob Cernohous <bobc@us.ibm.com>
Date:   Thu Jul 25 14:23:39 2013 -0500

    Make comm destroy non-blocking and use pami callback to complete it

is not very intrusive to ch3. The code for ch3 will continue to have a blocking behavior. Why should this commit not be included? When MPI-4 FT comes around the ch3 code could quickly change to the non-blocking behavior (if needed).

The second commit,

commit eb719d6982150a1bad3422fb8e21322d0c0569a0
Author: Sameh Sharkawi <sssharka@us.ibm.com>
Date:   Thu Oct 10 14:04:20 2013 -0400

    Fixes for comm/geometry destroy handling of context_id

only changes pamid code. This commit fixes a hang in the shift2 test - which I believe is an internal test case but need to verify. I suppose it is possible that this test makes the same incorrect assumption as well.