Open yosefe opened 7 years ago
Does it hang forever in nanosleep() ?
On Wed, Jun 14, 2017 at 3:20 AM, Yossi Itigin notifications@github.com wrote:
Assigned #3697 https://github.com/open-mpi/ompi/issues/3697 to @shamisp https://github.com/shamisp.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-mpi/ompi/issues/3697#event-1122815543, or mute the thread https://github.com/notifications/unsubscribe-auth/ACIe2L0EeuusZcT5vC-y9JeExQu5CW95ks5sD5fPgaJpZM4N5ijf .
Unlikely, this is a fixed 100ns interval"
static inline void _opal_lifo_release_cpu (void)
{
/* NTH: there are many ways to cause the current thread to be suspended. This one
* should work well in most cases. Another approach would be to use poll (NULL, 0, ) but
* the interval will be forced to be in ms (instead of ns or us). Note that there
* is a performance improvement for the lifo test when this call is made on detection
* of contention but it may not translate into actually MPI or application performance
* improvements. */
static struct timespec interval = { .tv_sec = 0, .tv_nsec = 100 };
nanosleep (&interval, NULL);
}
Can you provide more information about what exactly is hanging? I.e., this is clearly one stack trace of the hang, but it must be looping on something that never completes.
@yosefe can I reproduce this with a single node ? Do I really need IB for this ? thanks
You don't need IB for this, just force the load of hugepage mpool. Also, this seems to happen in a very peculiar place, during a lifo_pop operation, which in this particualr context use the conditional load/store of the ARM64. As our tests pass on ARM64, I don't think the issue is in the atomic itself, but instead in the way the rb tree is initialized or how the freelist is grown in this particular instance.
is this a multi-thread run ? how many threads ?
@yosefe , can you please replace https://github.com/open-mpi/ompi/blob/d7ebcca93fc86796e0d43997935539b1e922a1f1/opal/class/opal_lifo.h#L226
with opal_atomic_mb()
@shamisp it's looping in https://github.com/open-mpi/ompi/blob/d7ebcca93fc86796e0d43997935539b1e922a1f1/opal/class/opal_lifo.h#L224, changing to atomic_mb() does not help. it can be reproduced on 1 node. disabling hugepage mpool moves the hange to another place - so it's something more fundamental:
(gdb) bt
#0 0x000003ffb7ccfa78 in nanosleep () from /usr/lib64/libpthread.so.0
#1 0x000003ffb7911e0c in _opal_lifo_release_cpu () at ../opal/class/opal_lifo.h:195
#2 0x000003ffb7911e40 in opal_lifo_pop_atomic (lifo=0x3ffb7a3b778 <mca_mpool_base_tree+64>) at ../opal/class/opal_lifo.h:210
#3 0x000003ffb7911fd0 in opal_free_list_get_st (flist=0x3ffb7a3b778 <mca_mpool_base_tree+64>) at ../opal/class/opal_free_list.h:213
#4 0x000003ffb7911ff4 in opal_free_list_get (flist=0x3ffb7a3b778 <mca_mpool_base_tree+64>) at ../opal/class/opal_free_list.h:225
#5 0x000003ffb7912264 in opal_rb_tree_init (tree=0x3ffb7a3b738 <mca_mpool_base_tree>, comp=0x3ffb79bdba0 <mca_mpool_base_tree_node_compare>) at class/opal_rb_tree.c:86
#6 0x000003ffb79bde28 in mca_mpool_base_tree_init () at base/mpool_base_tree.c:87
#7 0x000003ffb79bd378 in mca_mpool_base_open (flags=MCA_BASE_OPEN_DEFAULT) at base/mpool_base_frame.c:102
#8 0x000003ffb7957d1c in mca_base_framework_open (framework=0x3ffb7a34788 <opal_mpool_base_framework>, flags=MCA_BASE_OPEN_DEFAULT) at mca_base_framework.c:174
#9 0x000003ffb7d59bc0 in ompi_mpi_init (argc=4, argv=0x3ffffffe5c8, requested=0, provided=0x3ffffffe3bc) at runtime/ompi_mpi_init.c:589
#10 0x000003ffb7d990e8 in PMPI_Init (argc=0x3ffffffe43c, argv=0x3ffffffe430) at pinit.c:66
#11 0x0000000000400c5c in main (argc=4, argv=0x3ffffffe5c8) at ctxalloc.c:20
command line on single node:
mpirun -np 1 -mca btl self --bind-to core --display-map -mca pml ob1 ./ctxalloc 2 1500 100
I can reproduce the hang on my ARM64 machine.
Open MPI: openmpi-v2.x-201706150321-b562082 (nightly snapshot tarball)
CPU: Cavium Thunder X (ARMv8 (64-bit))
OS: CentOS Linux release 7.2.1603 (AltArch)
Compiler: GCC 4.8.5 (in CentOS)
configure option: --enable-debug
or --disable-debug
compiler option: -O0
I cannot reproduce the hang with -O1
or -O2
(either --enable-debug
or --disable-debug
).
Test programs in the Open MPI source tree also hang.
test/class/opal_lifo
(gdb) bt
#0 0x0000ffffb7d009a8 in __nanosleep_nocancel () from /lib64/libpthread.so.0
#1 0x0000000000401384 in _opal_lifo_release_cpu ()
at ../../../../opal/class/opal_lifo.h:195
#2 0x00000000004013b8 in opal_lifo_pop_atomic (lifo=0xffffffffed80)
at ../../../../opal/class/opal_lifo.h:210
#3 0x00000000004014f4 in thread_test (arg=0xffffffffed80)
at ../../../../test/class/opal_lifo.c:50
#4 0x0000000000401ab8 in main (argc=1, argv=0xfffffffff018)
at ../../../../test/class/opal_lifo.c:147
test/class/opal_fifo
(gdb) bt
#0 opal_fifo_pop_atomic (fifo=0xffffffffed68)
at ../../../../opal/class/opal_fifo.h:238
#1 0x00000000004015d4 in thread_test (arg=0xffffffffed68)
at ../../../../test/class/opal_fifo.c:51
#2 0x0000000000401e3c in main (argc=1, argv=0xfffffffff018)
at ../../../../test/class/opal_fifo.c:184
test/class/ompi_rb_tree
(gdb) bt
#0 0x0000ffffb7a589a8 in __nanosleep_nocancel () from /lib64/libpthread.so.0
#1 0x0000ffffb7bfb338 in _opal_lifo_release_cpu ()
at ../../../opal/class/opal_lifo.h:195
#2 0x0000ffffb7bfb36c in opal_lifo_pop_atomic (lifo=0xffffffffec90)
at ../../../opal/class/opal_lifo.h:210
#3 0x0000ffffb7bfb4fc in opal_free_list_get_st (flist=0xffffffffec90)
at ../../../opal/class/opal_free_list.h:213
#4 0x0000ffffb7bfb520 in opal_free_list_get (flist=0xffffffffec90)
at ../../../opal/class/opal_free_list.h:225
#5 0x0000ffffb7bfb790 in opal_rb_tree_init (tree=0xffffffffec50, comp=0x4016c4 <comp_fn>)
at ../../../opal/class/opal_rb_tree.c:86
#6 0x0000000000401c04 in test1 ()
at ../../../../test/class/ompi_rb_tree.c:145
#7 0x0000000000402850 in main (argc=1, argv=0xfffffffff018)
at ../../../../test/class/ompi_rb_tree.c:408
In the opal_lifo
case, it is looping in do-while
loop of the opal_lifo_pop_atomic
function (OPAL_HAVE_ATOMIC_LLSC_PTR == 1
case).
In the opal_fifo
case, it is looping in do-while
loop of the opal_fifo_pop_atomic
function (OPAL_HAVE_ATOMIC_LLSC_PTR == 1
case). Not continue
case.
Does it run with --disable-builtin-atomics?
@hjelmn Builtin atomics are disabled by default in v2.x branch. I enabled BUILTIN_GCC (__atomic_*
) by --enable-builtin-atomics
. Then make check
(which includes three test programs above) passed without hang. Both -O0
and -O2
were tested.
@yosefe - what compiler version is used ?
$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
@PHHargrove reported similar issues on the devel list.
Regarding make check
, totally reported version/platform/compiler/test patterns of fail/hang are:
A common point of ARM64 and PPC64(LE) is OPAL_HAVE_ATOMIC_LLSC_PTR == 1
(grep -r OPAL_HAVE_ATOMIC_LLSC_ opal/include/opal/sys
shows it). But I'm wondering why the following patterns succeed. ARM64 and PPC64 show opposite results regarding --enable-builtin-atomics
.
--enable-builtin-atomics
(I tested)--disable-builtin-atomics
is default in Open MPI 2.1 series.
--enable-builtin-atomics
is default in Open MPI 3.0 series.
Added a v3.0.0 milestone since @PHHargrove saw this on 3.0.0rc1, per the above comment.
I will try to reproduce it on one of my systems.
I can confirm that the problem only show ups in "-O0" mode.
I know what is happening. With -O0 the opalatomic functions are not inlined. That makes the LL/SC atomics a function call. This will likely* cause livelock with the LL/SC fifo/lifo implementations as it increases the chance that a read will cancel the LL reservation. The correct fix is to force those atomics to always be inlined. I will make the change and see if it fixes the issue.
@hjelmn I confirmed the current status on AArch64.
--enable-builtin-atomics
--disable-builtin-atomics
-O0
-O3
make check
result:
enable + -O0 |
enable + -O3 |
disable + -O0 |
disable + -O3 |
|
---|---|---|---|---|
2.0.3 | OK | OK | OK | OK |
2.1.1 | OK | OK | hang-up | OK |
3.0.0rc1 | OK | OK | hang-up | OK |
2.0.x nightly | OK | OK | OK | OK |
2.1.x nightly | OK | OK | hang-up | OK |
3.0.x nightly | OK | OK | hang-up | OK |
All hang-up occur in the test/class
directory (opal_fifo
etc.).
As you said, bad condition is: Open MPI 2.1 or higher + --disable-builtin-atomics
+ -O0
Open MPI 2.0.x does not have this issue because it does not have opal/include/opal/sys/arm64
code and LL/SC fifo/lifo implementations are not used.
I'll confirm performance difference of --enable-builtin-atomics
and --disable-builtin-atomics
.
@hjelmn I run osu_latency
and osu_latency_mt
on my ARM machine and found --disable-builtin-atomics
is slightly faster than --enable-builtin-atomics
on osu_latency_mt
.
-O3
(both Open MPI and OSU Micro-Benchmarks)osu_latency
latency (us):
size (byte) | --enable-builtin-atomics |
--disable-builtin-atomics |
---|---|---|
0 | 0.88 | 0.87 |
1 | 1.07 | 1.08 |
2 | 1.08 | 1.08 |
4 | 1.09 | 1.08 |
8 | 1.11 | 1.11 |
16 | 1.13 | 1.13 |
32 | 1.13 | 1.13 |
64 | 1.16 | 1.15 |
osu_latency_mt
latency (us):
size (byte) | --enable-builtin-atomics |
--disable-builtin-atomics |
---|---|---|
0 | 4.45 | 4.44 |
1 | 4.73 | 4.70 |
2 | 4.73 | 4.70 |
4 | 4.74 | 4.71 |
8 | 4.76 | 4.74 |
16 | 4.78 | 4.76 |
32 | 4.79 | 4.78 |
64 | 4.83 | 4.80 |
Each value is a median value of 10 times runs.
If you need more data, let me know.
@hjelmn You asked me which commits I backported into v2.0.2-based Fujitsu MPI in the f2f meeting. I backported the following commits. Open MPI 2.0 can run on AArch64 without these commits but I backported for better support of AArch64.
I have tried gcc gcc/6.1.0 and gcc/7.1.0 and I still observe the same issue.
@kawashima-fj #3988 should fix the hang.
Its no surprise that the built-in atomics version is slower. The LL/SC lifo is significantly faster than the compare-and-swap version.
This is the correct one. Think we have a fix.
@hjelmn / @jjhursey, where are we on fixing this issue? What branches are still impacted?
Per 2018-03 Dallas face-to-face meeting, this is still happening to Fujitsu on ARMv8. It was discussed in the Dallas meeting; @hjelmn is looking into this.
Can we have a bit more details on this ? What AMO is broken ? Does it happen only with built-in AMOs ?
Background information
Details of the problem
A hang in ctxalloc test during MPI_Init. Similar hang is observed in many other tests. ctxalloc can be found here
stack trace: