Closed hppritcha closed 3 years ago
Interesting. No problem on v3.1.x?
Will take a look this evening.
I should add that the udreg was actually only useful when running multi-node. That fixed "bad deregister" return calls from the uGNI library.
Can't reproduce the issue in a VM with the current xpmem HEAD. Does it occur if you run single-node with --mca btl self,vader
? If not then it will help narrow down where the issues is.
this is on cori at NERSC. Yes it does occur on a single node with --mca btl self,vader. Note to reliably reproduce I need to use on the order of 32 ranks.
this probrem reproduced in our aarch64 architecture environmet.
SEKI vader_get_registation reg=57ff20
SEKI vader_get_registation reg=59d710
SEKI vader_get_registation reg=59d7f0
SEKI vader_get_registation reg=59d8d0
SEKI vader_get_registation reg=59d9b0
SEKI vader_get_registation reg=59da90
SEKI vader_get_registation reg=59db70
SEKI vader_get_registation reg=5aab60
SEKI vader_get_registation reg=5aa1a0 *
SEKI vader_get_registation reg=5aa280
SEKI vader_get_registation reg=575620 #
SEKI vader_get_registation reg=5b5bc0
SEKI vader_get_registation reg=5b5ca0
SEKI vader_get_registation reg=575620 #
SEKI vader_get_registation reg=5aa1a0 *
SEKI vader_get_registation reg=5b64c0
SEKI vader_get_registation reg=5b65a0
SEKI vader_get_registation reg=5b6680
SEKI vader_get_registation reg=5b6760
SEKI vader_get_registation reg=5b6680
SEKI vader_get_registation reg=575620 #
SEKI vader_get_registation reg=5aa1a0 *
SEKI mca_btl_vader_endpoint_xpmem_rcache_cleanup ep=5930c8 reg=5aab60 alloc_base=4 peer_smp_rank=1
SEKI mca_btl_vader_endpoint_xpmem_rcache_cleanup ep=5930c8 reg=5aa280 alloc_base=2 peer_smp_rank=1
SEKI mca_btl_vader_endpoint_xpmem_rcache_cleanup ep=5930c8 reg=575620 alloc_base=7 peer_smp_rank=1 #
SEKI mca_btl_vader_endpoint_xpmem_rcache_cleanup ep=5930c8 reg=575620 alloc_base=7 peer_smp_rank=1 #
SEKI mca_btl_vader_endpoint_xpmem_rcache_cleanup ep=5930c8 reg=5aa1a0 alloc_base=1 peer_smp_rank=1 *
SEKI mca_btl_vader_endpoint_xpmem_rcache_cleanup ep=5930c8 reg=575620 alloc_base=7 peer_smp_rank=1 #
SEKI mca_btl_vader_endpoint_xpmem_rcache_cleanup ep=5930c8 reg=5aa1a0 alloc_base=1 peer_smp_rank=1 *
In the case of *, In the mca_btl_vader_endpoint_xpmem_rcache_cleanup, condtion of 'reg->alloc_base == ep->peer_smp_rank' becomes true, OBJ_RELEASE for reg of the same address runs twice.
In the case of #, In the mca_btl_vader_endpoint_xpmem_rcache_cleanup, condtion of 'reg->alloc_base == ep->peer_smp_rank' becomes false, reg is not free by OBJ_RELEASE.
Is it correct that mca_btl_vader_endpoint_xpmem_rcache_cleanup is called by the same address reg?
diff --git a/opal/mca/btl/vader/btl_vader_xpmem.c b/opal/mca/btl/vader/btl_vader_xpmem.c
index 219c0bd5f7..fe6c0c5760 100644
--- a/opal/mca/btl/vader/btl_vader_xpmem.c
+++ b/opal/mca/btl/vader/btl_vader_xpmem.c
@@ -115,6 +115,7 @@ mca_rcache_base_registration_t *vader_get_registation (struct mca_btl_base_endpo
if (NULL == reg) {
reg = OBJ_NEW(mca_rcache_base_registration_t);
+fprintf(stderr,"SEKI %s reg=%lx \n",__func__,reg);fflush(stderr);
if (OPAL_LIKELY(NULL != reg)) {
/* stick around for awhile */
reg->ref_count = 2;
@@ -154,6 +155,7 @@ mca_rcache_base_registration_t *vader_get_registation (struct mca_btl_base_endpo
static int mca_btl_vader_endpoint_xpmem_rcache_cleanup (mca_rcache_base_registration_t *reg, void *ctx)
{
mca_btl_vader_endpoint_t *ep = (mca_btl_vader_endpoint_t *) ctx;
+fprintf(stderr,"SEKI %s ep=%lx reg=%lx alloc_base=%ld peer_smp_rank=%ld \n",__func__,ep,reg,(intptr_t)reg->alloc_base,ep->peer_smp_rank);fflush(stderr);
if ((intptr_t) reg->alloc_base == ep->peer_smp_rank) {
/* otherwise dereg will fail on assert */
reg->ref_count = 0;
I executed IMB-MPI1 at 4 processes on the single-node with the option "-npmin 10000 -iter 1 alltoall -msglen ./len.txt".
cat len.txt 1024 2048 4096 8192 16384 32768 65536
fixed via #7283
I'm observing problems using the XPMEM vader single copy mechanism on master and v4.0.x. Using XPMEM at modest numbers of processes leads to heap corruption and/or set faults in the vader teardown code. I have reason to believe that there are also problems with the grdma rcache which compound the problem. I observed that using the udreg rcache on the Cray's reduced some of the problems, but the XPMEM memory corruption problem remains.
I am able to reproduce the problem fairly reliably with IMB at 32 processes using the allreduce test.
Here's what I see on NERSC cori system using Open MPI 4.0.1 and a --enable-debug built library:
If I use the cma single copy mechanism, the application runs nominally.
If I don't use a debug library, the problem usually manifests itself as messages from glibc about corrupted heap, or segfaults.