sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Floating Point Exception #235

Open ashishdamania opened 9 years ago

ashishdamania commented 9 years ago

Hello, I am having trouble running ray. I am not sure if this is a simple error on my side or something related to Openmpi. screen shot 2014-11-19 at 8 41 39 pm

root@snap-2:/home/userone/Ray-2.3.1# mpiexec --allow-run-as-root -np 2 -bynode ./Ray
[snap-2:24679] *** Process received signal ***
[snap-2:24679] Signal: Floating point exception (8)
[snap-2:24679] Signal code: Integer divide-by-zero (1)
[snap-2:24679] Failing at address: 0x7f6a7b582604
[snap-2:24679] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7f6a7aa5c0a0]
[snap-2:24679] [ 1] /home/root/.openmpi/lib/libopen-pal.so.6(+0x88604)[0x7f6a7b582604]
[snap-2:24679] [ 2] /home/root/.openmpi/lib/libopen-pal.so.6(+0x89317)[0x7f6a7b583317]
[snap-2:24679] [ 3] /home/root/.openmpi/lib/libopen-pal.so.6(+0x893b1)[0x7f6a7b5833b1]
[snap-2:24679] [ 4] /home/root/.openmpi/lib/libopen-pal.so.6(opal_hwloc172_hwloc_topology_load+0x1a3)[0x7f6a7b5904f3]
[snap-2:24679] [ 5] /home/root/.openmpi/lib/libopen-pal.so.6(opal_hwloc_base_get_topology+0xbe)[0x7f6a7b56ebee]
[snap-2:24679] [ 6] /home/root/.openmpi/lib/openmpi/mca_ess_hnp.so(+0x4086)[0x7f6a79eb7086]
[snap-2:24679] [ 7] /home/root/.openmpi/lib/libopen-rte.so.7(orte_init+0x174)[0x7f6a7b7e45e4]
[snap-2:24679] [ 8] mpiexec(orterun+0x7df)[0x404998]
[snap-2:24679] [ 9] mpiexec(main+0x20)[0x403dec]
[snap-2:24679] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f6a7a6dfead]
[snap-2:24679] [11] mpiexec[0x403ce9]
[snap-2:24679] *** End of error message ***
Floating point exception
uname -a
Linux snap-2 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2+deb7u1 x86_64 GNU/Linux
ldd /bin/ls

    linux-vdso.so.1 =>  (0x00007fffc24b9000)
    libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f1a8f2ca000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1a8f0c2000)
    libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007f1a8eeb8000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1a8eb2c000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1a8e928000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1a8f4f0000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1a8e70b000)
    libattr.so.1 => /lib/x86_64-linux-gnu/libattr.so.1 (0x00007f1a8e506000)
sebhtml commented 9 years ago

All the frames in the callstack are not in Ray.

It looks like a Open MPI problem.

From the stack, it is not clear what is the problem.

ashishdamania commented 9 years ago

Here is my openmpi info: I installed openmpi from source:

$mpiexec --allow-run-as-root --version
mpiexec (OpenRTE) 1.8.3
ashishdamania commented 9 years ago

Agree, Open MPI problem. It works fine if installed from Ubuntu 14.04 packages.

sebhtml commented 9 years ago

Was this caused by hwloc (this appears in the call stack) ?

ashishdamania commented 9 years ago

I could not dig into any specifics. I would love to find out but I am constrained by the time right now. Thanks for the immediate response.

ashishdamania commented 9 years ago

I tested installation of Ray on Debian via Virtual Box and I was not able to reproduce this issue. The above issue was caused on the Google Compute Engine machine so I am not sure if it is related to that but the output for the

uname -a
Linux debian 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2+deb7u1 x86_64 GNU/Linux

which is identical to one above.

sebhtml commented 9 years ago

Hi,

It is possible to pinpoint the place where the problem occurs. Your stack contains this entry:

[snap-2:24679] Failing at address: 0x7f6a7b582604

Each Ray process has its own virtual memory address space. The address 0x7f6a7b582604 is too high to be inside the Ray executable. The addresses of the Ray executable in the virtual memory are typically in the range 0x00400000 to 0x004dd000 (at least on Linux, using dynamic linking).

The division by 0 (SIGFPE) is presumably in one of the libraries (one of the .so files of Open-MPI).

On Linux, you can inspect the memory mapping of any running process by looking at the file /proc/PID/maps where PID is the process identifier.

Below is the map of a bash (shell) process:

$ cat /proc/2842/maps
00400000-004dd000 r-xp 00000000 fd:01 804953                             /usr/bin/bash
006dc000-006dd000 r--p 000dc000 fd:01 804953                             /usr/bin/bash
006dd000-006e6000 rw-p 000dd000 fd:01 804953                             /usr/bin/bash
006e6000-006ec000 rw-p 00000000 00:00 0 
00b5d000-00cec000 rw-p 00000000 00:00 0                                  [heap]
7f8be57fd000-7f8be5808000 r-xp 00000000 fd:01 805145                     /usr/lib64/libnss_files-2.18.so
7f8be5808000-7f8be5a07000 ---p 0000b000 fd:01 805145                     /usr/lib64/libnss_files-2.18.so
7f8be5a07000-7f8be5a08000 r--p 0000a000 fd:01 805145                     /usr/lib64/libnss_files-2.18.so
7f8be5a08000-7f8be5a09000 rw-p 0000b000 fd:01 805145                     /usr/lib64/libnss_files-2.18.so
7f8be5a09000-7f8bebf31000 r--p 00000000 fd:01 805609                     /usr/lib/locale/locale-archive
7f8bebf31000-7f8bec0e5000 r-xp 00000000 fd:01 794730                     /usr/lib64/libc-2.18.so
7f8bec0e5000-7f8bec2e4000 ---p 001b4000 fd:01 794730                     /usr/lib64/libc-2.18.so
7f8bec2e4000-7f8bec2e8000 r--p 001b3000 fd:01 794730                     /usr/lib64/libc-2.18.so
7f8bec2e8000-7f8bec2ea000 rw-p 001b7000 fd:01 794730                     /usr/lib64/libc-2.18.so
7f8bec2ea000-7f8bec2ef000 rw-p 00000000 00:00 0 
7f8bec2ef000-7f8bec2f2000 r-xp 00000000 fd:01 805141                     /usr/lib64/libdl-2.18.so
7f8bec2f2000-7f8bec4f1000 ---p 00003000 fd:01 805141                     /usr/lib64/libdl-2.18.so
7f8bec4f1000-7f8bec4f2000 r--p 00002000 fd:01 805141                     /usr/lib64/libdl-2.18.so
7f8bec4f2000-7f8bec4f3000 rw-p 00003000 fd:01 805141                     /usr/lib64/libdl-2.18.so
7f8bec4f3000-7f8bec518000 r-xp 00000000 fd:01 795502                     /usr/lib64/libtinfo.so.5.9
7f8bec518000-7f8bec718000 ---p 00025000 fd:01 795502                     /usr/lib64/libtinfo.so.5.9
7f8bec718000-7f8bec71c000 r--p 00025000 fd:01 795502                     /usr/lib64/libtinfo.so.5.9
7f8bec71c000-7f8bec71d000 rw-p 00029000 fd:01 795502                     /usr/lib64/libtinfo.so.5.9
7f8bec71d000-7f8bec73d000 r-xp 00000000 fd:01 805595                     /usr/lib64/ld-2.18.so
7f8bec918000-7f8bec91b000 rw-p 00000000 00:00 0 
7f8bec932000-7f8bec934000 rw-p 00000000 00:00 0 
7f8bec934000-7f8bec93b000 r--s 00000000 fd:01 1050101                    /usr/lib64/gconv/gconv-modules.cache
7f8bec93b000-7f8bec93c000 rw-p 00000000 00:00 0 
7f8bec93c000-7f8bec93d000 r--p 0001f000 fd:01 805595                     /usr/lib64/ld-2.18.so
7f8bec93d000-7f8bec93e000 rw-p 00020000 fd:01 805595                     /usr/lib64/ld-2.18.so
7f8bec93e000-7f8bec93f000 rw-p 00000000 00:00 0 
7fff059ba000-7fff059db000 rw-p 00000000 00:00 0                          [stack]
7fff059fc000-7fff059fe000 r--p 00000000 00:00 0                          [vvar]
7fff059fe000-7fff05a00000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
ashishdamania commented 9 years ago

Hi Sébastien,

I tried to repeat this error on Google cloud compute again.

1) Added SID sources to the /etc/apt/sources.list

deb http://gce_debian_mirror.storage.googleapis.com/ wheezy main contrib non-free
deb-src http://gce_debian_mirror.storage.googleapis.com/ wheezy main contrib non-free
deb http://security.debian.org/ wheezy/updates main contrib non-free
deb-src http://security.debian.org/ wheezy/updates main contrib non-free
deb http://gce_debian_mirror.storage.googleapis.com/ wheezy-updates main contrib non-free
deb-src http://gce_debian_mirror.storage.googleapis.com/ wheezy-updates main contrib non-free
deb http://http.debian.net/debian wheezy main
deb-src http://http.debian.net/debian wheezy main

#SID sources for testing OPEN-MPI
deb http://ftp.us.debian.org/debian unstable main contrib non-free
deb http://ftp.debian.org/debian/ Sid-updates main contrib non-free
deb http://security.debian.org/ Sid/updates main contrib non-free

2) apt-get update

3) Installed Ray via apt-get

4) Running Ray gives me this error which is most likely due to Openmpi as you suggest above.

root@testmpi:/etc/apt# Ray --version
[testmpi:05075] *** Process received signal ***
[testmpi:05075] Signal: Floating point exception (8)
[testmpi:05075] Signal code: Integer divide-by-zero (1)
[testmpi:05075] Failing at address: 0x7fd0b297826e
[testmpi:05075] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0) [0x7fd0b252b8d0]
[testmpi:05075] [ 1] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x3526e) [0x7fd0b297826e]
[testmpi:05075] [ 2] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x36c00) [0x7fd0b2979c00]
[testmpi:05075] [ 3] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x3717d) [0x7fd0b297a17d]
[testmpi:05075] [ 4] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x372e6) [0x7fd0b297a2e6]
[testmpi:05075] [ 5] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0xc71a) [0x7fd0b294f71a]
[testmpi:05075] [ 6] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(hwloc_topology_load+0x1ab) [0x7fd0b2950757]
[testmpi:05075] [ 7] /usr/lib/libopen-rte.so.4(orte_odls_base_open+0x7a7) [0x7fd0b34fb5f7]
[testmpi:05075] [ 8] /usr/lib/openmpi/lib/openmpi/mca_ess_hnp.so(+0x2f60) [0x7fd0b114df60]
[testmpi:05075] [ 9] /usr/lib/libopen-rte.so.4(orte_init+0x193) [0x7fd0b34d29f3]
[testmpi:05075] [10] /usr/lib/libopen-rte.so.4(orte_daemon+0x20c) [0x7fd0b34ed62c]
[testmpi:05075] [11] orted() [0x400848]
[testmpi:05075] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fd0b2194b45]
[testmpi:05075] [13] orted() [0x400899]
[testmpi:05075] *** End of error message ***
[testmpi:05074] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 343
[testmpi:05074] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 140
[testmpi:05074] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 128
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node (-128) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128) instead of "Success" (0)
--------------------------------------------------------------------------
[testmpi:5074] *** An error occurred in MPI_Init
[testmpi:5074] *** on a NULL communicator
[testmpi:5074] *** Unknown error
[testmpi:5074] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason:     Before MPI_INIT completed
  Local host: testmpi
  PID:        5074
--------------------------------------------------------------------------

I see that address 0x7fd0b297826e is matching /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x3526e) [0x7fd0b297826e]

However this time we see a different .so file libhwloc.so.5 as compared to last time /home/root/.openmpi/lib/libopen-pal.so.6

Also, I am not sure how to check the pid if the process is terminated very soon.

Thanks for all your help.

ashishdamania commented 9 years ago

Here is the strace output

execve("/usr/bin/Ray", ["Ray", "--version"], [/* 18 vars */]) = 0
brk(0)                                  = 0x1ee8000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e8000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=12540, ...}) = 0
mmap(NULL, 12540, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f25cb2e4000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libz.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340#\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=92752, ...}) = 0
mmap(NULL, 2187792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25caeb3000
mprotect(0x7f25caec9000, 2093056, PROT_NONE) = 0
mmap(0x7f25cb0c8000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f25cb0c8000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libbz2.so.1.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\30\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=66824, ...}) = 0
mmap(NULL, 2162024, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25caca3000
mprotect(0x7f25cacb2000, 2093056, PROT_NONE) = 0
mmap(0x7f25caeb1000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0x7f25caeb1000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib/libmpi_cxx.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\361\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=110176, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e3000
mmap(NULL, 2206480, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25caa88000
mprotect(0x7f25caaa0000, 2097152, PROT_NONE) = 0
mmap(0x7f25caca0000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x7f25caca0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib/libmpi.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\211\4\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1550048, ...}) = 0
mmap(NULL, 3743184, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25ca6f6000
mprotect(0x7f25ca85a000, 2093056, PROT_NONE) = 0
mmap(0x7f25caa59000, 94208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x163000) = 0x7f25caa59000
mmap(0x7f25caa70000, 97744, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f25caa70000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14664, ...}) = 0
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25ca4f2000
mprotect(0x7f25ca4f5000, 2093056, PROT_NONE) = 0
mmap(0x7f25ca6f4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f25ca6f4000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libhwloc.so.5", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320b\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=281040, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e2000
mmap(NULL, 2376536, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25ca2ad000
mprotect(0x7f25ca2f1000, 2093056, PROT_NONE) = 0
mmap(0x7f25ca4f0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x43000) = 0x7f25ca4f0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\270\5\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1008120, ...}) = 0
mmap(NULL, 3188384, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c9fa2000
mprotect(0x7f25ca08e000, 2097152, PROT_NONE) = 0
mmap(0x7f25ca28e000, 40960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xec000) = 0x7f25ca28e000
mmap(0x7f25ca298000, 83616, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f25ca298000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200U\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1051056, ...}) = 0
mmap(NULL, 3146072, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c9ca1000
mprotect(0x7f25c9da1000, 2093056, PROT_NONE) = 0
mmap(0x7f25c9fa0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xff000) = 0x7f25c9fa0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260*\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=90096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e1000
mmap(NULL, 2185952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c9a8b000
mprotect(0x7f25c9aa1000, 2093056, PROT_NONE) = 0
mmap(0x7f25c9ca0000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f25c9ca0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20o\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=137440, ...}) = 0
mmap(NULL, 2213008, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c986e000
mprotect(0x7f25c9886000, 2093056, PROT_NONE) = 0
mmap(0x7f25c9a85000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f25c9a85000
mmap(0x7f25c9a87000, 13456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f25c9a87000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1729984, ...}) = 0
mmap(NULL, 3836448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c94c5000
mprotect(0x7f25c9664000, 2097152, PROT_NONE) = 0
mmap(0x7f25c9864000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19f000) = 0x7f25c9864000
mmap(0x7f25c986a000, 14880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f25c986a000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P#\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=31784, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e0000
mmap(NULL, 2128920, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c92bd000
mprotect(0x7f25c92c4000, 2093056, PROT_NONE) = 0
mmap(0x7f25c94c3000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f25c94c3000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libnsl.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`A\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=89104, ...}) = 0
mmap(NULL, 2194072, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c90a5000
mprotect(0x7f25c90ba000, 2093056, PROT_NONE) = 0
mmap(0x7f25c92b9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x7f25c92b9000
mmap(0x7f25c92bb000, 6808, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f25c92bb000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libutil.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\17\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=10680, ...}) = 0
mmap(NULL, 2105624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c8ea2000
mprotect(0x7f25c8ea4000, 2093056, PROT_NONE) = 0
mmap(0x7f25c90a3000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f25c90a3000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libltdl.so.7", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260$\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=39392, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2df000
mmap(NULL, 2134736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c8c98000
mprotect(0x7f25c8ca1000, 2093056, PROT_NONE) = 0
mmap(0x7f25c8ea0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x7f25c8ea0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libnuma.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`3\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=44152, ...}) = 0
mmap(NULL, 2140424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c8a8d000
mprotect(0x7f25c8a97000, 2093056, PROT_NONE) = 0
mmap(0x7f25c8c96000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9000) = 0x7f25c8c96000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2de000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2dd000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2db000
arch_prctl(ARCH_SET_FS, 0x7f25cb2db740) = 0
mprotect(0x7f25c9864000, 16384, PROT_READ) = 0
mprotect(0x7f25c8c96000, 4096, PROT_READ) = 0
mprotect(0x7f25ca6f4000, 4096, PROT_READ) = 0
mprotect(0x7f25c8ea0000, 4096, PROT_READ) = 0
mprotect(0x7f25c90a3000, 4096, PROT_READ) = 0
mprotect(0x7f25c92b9000, 4096, PROT_READ) = 0
mprotect(0x7f25c9a85000, 4096, PROT_READ) = 0
mprotect(0x7f25c94c3000, 4096, PROT_READ) = 0
mprotect(0x7f25c9fa0000, 4096, PROT_READ) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2da000
mprotect(0x7f25ca28e000, 32768, PROT_READ) = 0
mprotect(0x7f25ca4f0000, 4096, PROT_READ) = 0
mprotect(0x7f25caa59000, 20480, PROT_READ) = 0
mprotect(0x7f25caca0000, 8192, PROT_READ) = 0
mprotect(0x7f25caeb1000, 4096, PROT_READ) = 0
mprotect(0x7f25cb0c8000, 4096, PROT_READ) = 0
mprotect(0x7c9000, 4096, PROT_READ)     = 0
mprotect(0x7f25cb2ea000, 4096, PROT_READ) = 0
munmap(0x7f25cb2e4000, 12540)           = 0
set_tid_address(0x7f25cb2dba10)         = 7374
set_robust_list(0x7f25cb2dba20, 24)     = 0
rt_sigaction(SIGRTMIN, {0x7f25c98749f0, [], SA_RESTORER|SA_SIGINFO, 0x7f25c987d8d0}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f25c9874a80, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f25c987d8d0}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
access("/dev/ummunotify", F_OK)         = -1 ENOENT (No such file or directory)
access("/sys/class/infiniband", F_OK)   = -1 ENOENT (No such file or directory)
access("/dev/open-mx", F_OK)            = -1 ENOENT (No such file or directory)
access("/dev/myri0", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri1", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri2", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri3", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri4", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri5", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri6", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri7", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri8", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/myri9", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/ipath", F_OK)              = -1 ENOENT (No such file or directory)
access("/dev/kgni0", F_OK)              = -1 ENOENT (No such file or directory)
brk(0)                                  = 0x1ee8000
brk(0x1f09000)                          = 0x1f09000
open("/proc/self/status", O_RDONLY)     = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
read(3, "Name:\tRay\nState:\tR (running)\nTgi"..., 1024) = 756
close(3)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
openat(AT_FDCWD, "/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 7 entries */, 32768)     = 216
open("/sys/devices/system/node/node0/meminfo", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
read(4, "Node 0 MemTotal:        3931700 "..., 4096) = 1000
read(4, "", 4096)                       = 0
close(4)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
sched_getaffinity(0, 512, {1, 0, 0, 0, 0, 0, 0, 0}) = 64
openat(AT_FDCWD, "/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 16 entries */, 32768)    = 504
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
open("/proc/self/status", O_RDONLY)     = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
read(3, "Name:\tRay\nState:\tR (running)\nTgi"..., 1024) = 756
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
futex(0x7f25ca2aab0c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f25ca2aab18, FUTEX_WAKE_PRIVATE, 2147483647) = 0
uname({sys="Linux", node="testmpi", ...}) = 0
getcwd("/root", 4096)                   = 6
open("/etc/openmpi/openmpi-mca-params.conf", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffba640e50) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(3, {st_mode=S_IFREG|0644, st_size=2812, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
read(3, "#\n# Copyright (c) 2004-2005 The "..., 8192) = 2812
read(3, "", 4096)                       = 0
read(3, "", 8192)                       = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffba640dc0) = -1 ENOTTY (Inappropriate ioctl for device)
close(3)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
open("/root/.openmpi/mca-params.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
uname({sys="Linux", node="testmpi", ...}) = 0
rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGABRT, {0x7f25ca818d50, [], SA_RESTORER|SA_RESETHAND|SA_SIGINFO, 0x7f25c987d8d0}, NULL, 8) = 0
rt_sigaction(SIGBUS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGBUS, {0x7f25ca818d50, [], SA_RESTORER|SA_RESETHAND|SA_SIGINFO, 0x7f25c987d8d0}, NULL, 8) = 0
rt_sigaction(SIGFPE, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGFPE, {0x7f25ca818d50, [], SA_RESTORER|SA_RESETHAND|SA_SIGINFO, 0x7f25c987d8d0}, NULL, 8) = 0
rt_sigaction(SIGSEGV, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSEGV, {0x7f25ca818d50, [], SA_RESTORER|SA_RESETHAND|SA_SIGINFO, 0x7f25c987d8d0}, NULL, 8) = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=4*1024}) = 0
getrlimit(RLIMIT_NPROC, {rlim_cur=29622, rlim_max=29622}) = 0
getrlimit(RLIMIT_FSIZE, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
uname({sys="Linux", node="testmpi", ...}) = 0
openat(AT_FDCWD, "/usr/lib/openmpi/lib/openmpi", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 89 entries */, 32768)    = 3656
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
openat(AT_FDCWD, "/root/.openmpi/components", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc.la", O_RDONLY) = -1 ENOENT (No such file or directory)
futex(0x7f25ca6f50c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=18896, ...}) = 0
mmap(NULL, 2114048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c8888000
mprotect(0x7f25c888b000, 2097152, PROT_NONE) = 0
mmap(0x7f25c8a8b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f25c8a8b000
close(3)                                = 0
mprotect(0x7f25c8a8b000, 4096, PROT_READ) = 0
mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25c8487000
munmap(0x7f25c8487000, 4198400)         = 0
brk(0x2027000)                          = 0x2027000
mmap(NULL, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
munmap(0x7f25cb2e7000, 4096)            = 0
open("/proc/cpuinfo", O_RDONLY)         = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
lseek(3, 0, SEEK_SET)                   = 0
read(3, "processor\t: 0\nvendor_id\t: Genuin"..., 1024) = 744
read(3, "", 1024)                       = 0
lseek(3, 0, SEEK_SET)                   = 0
read(3, "processor\t: 0\nvendor_id\t: Genuin"..., 1024) = 744
close(3)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
open("/usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\n\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=10296, ...}) = 0
mmap(NULL, 2105448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c8685000
mprotect(0x7f25c8686000, 2097152, PROT_NONE) = 0
mmap(0x7f25c8886000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f25c8886000
close(3)                                = 0
mprotect(0x7f25c8886000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_carto_file.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_carto_file.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_carto_file.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\36\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=27000, ...}) = 0
mmap(NULL, 2122280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f25c847e000
mprotect(0x7f25c8484000, 2093056, PROT_NONE) = 0
mmap(0x7f25c8683000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x7f25c8683000
close(3)                                = 0
mprotect(0x7f25c8683000, 4096, PROT_READ) = 0
munmap(0x7f25c847e000, 2122280)         = 0
gettimeofday({1417537452, 464075}, NULL) = 0
epoll_create(32000)                     = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
socketpair(PF_LOCAL, SOCK_STREAM, 0, [4, 5]) = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_mmap.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_mmap.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_mmap.so", O_RDONLY|O_CLOEXEC) = 6
read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\20\0\0\0\0\0\0"..., 832) = 832
fstat(6, {st_mode=S_IFREG|0644, st_size=14872, ...}) = 0
mmap(NULL, 2110128, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7f25c8481000
mprotect(0x7f25c8484000, 2093056, PROT_NONE) = 0
mmap(0x7f25c8683000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x2000) = 0x7f25c8683000
close(6)                                = 0
mprotect(0x7f25c8683000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_posix.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_posix.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_posix.so", O_RDONLY|O_CLOEXEC) = 6
read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\r\0\0\0\0\0\0"..., 832) = 832
fstat(6, {st_mode=S_IFREG|0644, st_size=14808, ...}) = 0
mmap(NULL, 2110048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7f25c827d000
mprotect(0x7f25c827f000, 2097152, PROT_NONE) = 0
mmap(0x7f25c847f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x2000) = 0x7f25c847f000
close(6)                                = 0
mprotect(0x7f25c847f000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_sysv.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_sysv.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_shmem_sysv.so", O_RDONLY|O_CLOEXEC) = 6
read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\v\0\0\0\0\0\0"..., 832) = 832
fstat(6, {st_mode=S_IFREG|0644, st_size=10712, ...}) = 0
mmap(NULL, 2105952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7f25c807a000
mprotect(0x7f25c807c000, 2093056, PROT_NONE) = 0
mmap(0x7f25c827b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x1000) = 0x7f25c827b000
close(6)                                = 0
mprotect(0x7f25c827b000, 4096, PROT_READ) = 0
statfs("/dev/shm/", {f_type=0x1021994, f_bsize=4096, f_blocks=190180, f_bfree=190180, f_bavail=190180, f_files=475454, f_ffree=475452, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
futex(0x7f25c94c4330, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/dev/shm/open_mpi.0000", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 6
unlink("/dev/shm/open_mpi.0000")        = 0
shmget(IPC_PRIVATE, 4096, IPC_CREAT|IPC_EXCL|0700) = 425984
shmat(425984, 0, 0)                     = 0x7f25cb2e7000
shmctl(425984, IPC_RMID, 0)             = 0
shmctl(425984, IPC_STAT, 0x7fffba641070) = 0
shmdt(0x7f25cb2e7000)                   = 0
munmap(0x7f25c827d000, 2110048)         = 0
munmap(0x7f25c807a000, 2105952)         = 0
open("/usr/lib/openmpi/lib/openmpi/mca_crs_none.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_crs_none.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_crs_none.so", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\20\0\0\0\0\0\0"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0644, st_size=10688, ...}) = 0
mmap(NULL, 2105928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7f25c827e000
mprotect(0x7f25c8280000, 2093056, PROT_NONE) = 0
mmap(0x7f25c847f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x1000) = 0x7f25c847f000
close(7)                                = 0
mprotect(0x7f25c847f000, 4096, PROT_READ) = 0
uname({sys="Linux", node="testmpi", ...}) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_ess_env.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_env.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_env.so", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\23\0\0\0\0\0\0"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0644, st_size=14936, ...}) = 0
mmap(NULL, 2110176, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7f25c807a000
mprotect(0x7f25c807d000, 2093056, PROT_NONE) = 0
mmap(0x7f25c827c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x2000) = 0x7f25c827c000
close(7)                                = 0
mprotect(0x7f25c827c000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_ess_hnp.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_hnp.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_hnp.so", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0(\0\0\0\0\0\0"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0644, st_size=23448, ...}) = 0
mmap(NULL, 2118688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7f25c7e74000
mprotect(0x7f25c7e79000, 2093056, PROT_NONE) = 0
mmap(0x7f25c8078000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x4000) = 0x7f25c8078000
close(7)                                = 0
mprotect(0x7f25c8078000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_ess_singleton.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_singleton.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_singleton.so", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\25\0\0\0\0\0\0"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0644, st_size=14944, ...}) = 0
mmap(NULL, 2110176, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7f25c7c70000
mprotect(0x7f25c7c73000, 2093056, PROT_NONE) = 0
mmap(0x7f25c7e72000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x2000) = 0x7f25c7e72000
close(7)                                = 0
mprotect(0x7f25c7e72000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slave.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slave.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slave.so", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\21\0\0\0\0\0\0"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0644, st_size=14872, ...}) = 0
mmap(NULL, 2110112, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7f25c7a6c000
mprotect(0x7f25c7a6e000, 2097152, PROT_NONE) = 0
mmap(0x7f25c7c6e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x2000) = 0x7f25c7c6e000
close(7)                                = 0
mprotect(0x7f25c7c6e000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slurm.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slurm.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slurm.so", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\22\0\0\0\0\0\0"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0644, st_size=14936, ...}) = 0
mmap(NULL, 2110176, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7f25c7868000
mprotect(0x7f25c786a000, 2097152, PROT_NONE) = 0
mmap(0x7f25c7a6a000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x2000) = 0x7f25c7a6a000
close(7)                                = 0
mprotect(0x7f25c7a6a000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slurmd.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slurmd.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_slurmd.so", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\26\0\0\0\0\0\0"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0644, st_size=14936, ...}) = 0
mmap(NULL, 2110176, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7f25c7664000
mprotect(0x7f25c7667000, 2093056, PROT_NONE) = 0
mmap(0x7f25c7866000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x2000) = 0x7f25c7866000
close(7)                                = 0
mprotect(0x7f25c7866000, 4096, PROT_READ) = 0
open("/usr/lib/openmpi/lib/openmpi/mca_ess_tool.ompi_info", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_tool.la", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/openmpi/lib/openmpi/mca_ess_tool.so", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\f\0\0\0\0\0\0"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0644, st_size=10648, ...}) = 0
mmap(NULL, 2105888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7f25c7461000
mprotect(0x7f25c7463000, 2093056, PROT_NONE) = 0
mmap(0x7f25c7662000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x1000) = 0x7f25c7662000
close(7)                                = 0
mprotect(0x7f25c7662000, 4096, PROT_READ) = 0
munmap(0x7f25c807a000, 2110176)         = 0
munmap(0x7f25c7e74000, 2118688)         = 0
munmap(0x7f25c7a6c000, 2110112)         = 0
munmap(0x7f25c7868000, 2110176)         = 0
munmap(0x7f25c7664000, 2110176)         = 0
munmap(0x7f25c7461000, 2105888)         = 0
rt_sigaction(SIGCHLD, {0x7f25ca818700, [CHLD], SA_RESTORER|SA_RESTART, 0x7f25c94fa180}, {SIG_DFL, [], 0}, 8) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=5, u64=5}}) = 0
pipe([7, 8])                            = 0
pipe([9, 10])                           = 0
stat("/usr/bin/orted", {st_mode=S_IFREG|0755, st_size=6264, ...}) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f25cb2dba10) = 7375
close(8)                                = 0
close(9)                                = 0
read(7, [testmpi:07375] *** Process received signal ***
[testmpi:07375] Signal: Floating point exception (8)
[testmpi:07375] Signal code: Integer divide-by-zero (1)
[testmpi:07375] Failing at address: 0x7f27e7c8d26e
[testmpi:07375] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0) [0x7f27e78408d0]
[testmpi:07375] [ 1] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x3526e) [0x7f27e7c8d26e]
[testmpi:07375] [ 2] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x36c00) [0x7f27e7c8ec00]
[testmpi:07375] [ 3] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x3717d) [0x7f27e7c8f17d]
[testmpi:07375] [ 4] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x372e6) [0x7f27e7c8f2e6]
[testmpi:07375] [ 5] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0xc71a) [0x7f27e7c6471a]
[testmpi:07375] [ 6] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(hwloc_topology_load+0x1ab) [0x7f27e7c65757]
[testmpi:07375] [ 7] /usr/lib/libopen-rte.so.4(orte_odls_base_open+0x7a7) [0x7f27e88105f7]
[testmpi:07375] [ 8] /usr/lib/openmpi/lib/openmpi/mca_ess_hnp.so(+0x2f60) [0x7f27e6462f60]
[testmpi:07375] [ 9] /usr/lib/libopen-rte.so.4(orte_init+0x193) [0x7f27e87e79f3]
[testmpi:07375] [10] /usr/lib/libopen-rte.so.4(orte_daemon+0x20c) [0x7f27e880262c]
[testmpi:07375] [11] orted() [0x400848]
[testmpi:07375] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f27e74a9b45]
[testmpi:07375] [13] orted() [0x400899]
[testmpi:07375] *** End of error message ***
"", 255)                        = 0
write(2, "[testmpi:07374] [[INVALID],INVAL"..., 138[testmpi:07374] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 343
) = 138
write(2, "[testmpi:07374] [[INVALID],INVAL"..., 138[testmpi:07374] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 140
) = 138
write(2, "[testmpi:07374] [[INVALID],INVAL"..., 135[testmpi:07374] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 128
) = 135
open("/usr/share/openmpi/help-orte-runtime", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/openmpi/help-orte-runtime.txt", O_RDONLY) = 8
ioctl(8, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffba640f50) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(8, {st_mode=S_IFREG|0644, st_size=2786, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
read(8, "# -*- text -*-\n#\n# Copyright (c)"..., 8192) = 2786
read(8, "", 4096)                       = 0
close(8)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
write(2, "--------------------------------"..., 641--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node (-128) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
) = 641
open("/usr/share/openmpi/help-mpi-runtime", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/openmpi/help-mpi-runtime.txt", O_RDONLY) = 8
ioctl(8, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffba640f70) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(8, {st_mode=S_IFREG|0644, st_size=5023, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
read(8, "# -*- text -*-\n#\n# Copyright (c)"..., 8192) = 5023
read(8, "", 4096)                       = 0
close(8)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
write(2, "--------------------------------"..., 643--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128) instead of "Success" (0)
--------------------------------------------------------------------------
) = 643
open("/usr/share/openmpi/help-mpi-errors.txt", O_RDONLY) = 8
ioctl(8, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffba640f00) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(8, {st_mode=S_IFREG|0644, st_size=1264, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
read(8, "# -*- text -*-\n#\n# Copyright (c)"..., 8192) = 1264
read(8, "", 4096)                       = 0
read(8, "", 8192)                       = 0
ioctl(8, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffba640e60) = -1 ENOTTY (Inappropriate ioctl for device)
close(8)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
write(2, "[testmpi:7374] *** An error occu"..., 193[testmpi:7374] *** An error occurred in MPI_Init
[testmpi:7374] *** on a NULL communicator
[testmpi:7374] *** Unknown error
[testmpi:7374] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
) = 193
uname({sys="Linux", node="testmpi", ...}) = 0
open("/usr/share/openmpi/help-mpi-runtime.txt", O_RDONLY) = 8
ioctl(8, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffba640f10) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(8, {st_mode=S_IFREG|0644, st_size=5023, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f25cb2e7000
read(8, "# -*- text -*-\n#\n# Copyright (c)"..., 8192) = 5023
read(8, "", 4096)                       = 0
read(8, "", 8192)                       = 0
ioctl(8, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffba640e70) = -1 ENOTTY (Inappropriate ioctl for device)
close(8)                                = 0
munmap(0x7f25cb2e7000, 4096)            = 0
write(2, "--------------------------------"..., 425--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason:     Before MPI_INIT completed
  Local host: testmpi
  PID:        7374
--------------------------------------------------------------------------
) = 425
exit_group(1)                           = ?
+++ exited with 1 +++
sebhtml commented 9 years ago

What is your goal ?

Do you want to patch hwloc ?

ashishdamania commented 9 years ago

Hi @sebhtml

What is your goal ?

I just wanted to run Ray on Google Compute Debian image.

Do you want to patch hwloc ?

It sounds like interesting thing to do but I do not have extensive experience with C.

Thanks for the response. I am waiting for your next project to get ready.

sebhtml commented 9 years ago

SIGFPE with Open-MPI 1.8.3, --bind-to option, and hwloc

I know that Open-MPI 1.8.x (you use OpenMPI 1.8.3) uses process binding. This is new, the default was "--bind-to none" in previous series (1.7 and below) whereas in the 1.8 series the default is "--bind-to core".

I believe that hwloc is used when the option "--bind-to" is used with something else than none. Can you try adding the option "--bind-to none" to your command ?

SIGSEGV with Open-MPI 1.8.3

Also, Open-MPI 1.8.x uses the "vader" (like in Darth Vader) BTL (byte transfer layer) for sending messages between local processes. Open-MPI 1.8.3 contains a bug that leads to segmentation faults (SIGSEGV signal). I ran into this problem myself. See https://github.com/open-mpi/ompi/issues/235

The workaround for that one is to add the option "--mca btl ^vader" to avoid Darth Vader altogether.

ashishdamania commented 9 years ago

Hi @sebhtml, It gives me same error when I run the Ray with --bind-to none option.

root@testmpi:/etc/apt# mpiexec --bind-to none -n 1  Ray --version
[testmpi:05075] *** Process received signal ***
[testmpi:05075] Signal: Floating point exception (8)
[testmpi:05075] Signal code: Integer divide-by-zero (1)
[testmpi:05075] Failing at address: 0x7fd0b297826e
[testmpi:05075] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0) [0x7fd0b252b8d0]
[testmpi:05075] [ 1] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x3526e) [0x7fd0b297826e]
[testmpi:05075] [ 2] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x36c00) [0x7fd0b2979c00]
[testmpi:05075] [ 3] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x3717d) [0x7fd0b297a17d]
[testmpi:05075] [ 4] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0x372e6) [0x7fd0b297a2e6]
[testmpi:05075] [ 5] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(+0xc71a) [0x7fd0b294f71a]
[testmpi:05075] [ 6] /usr/lib/x86_64-linux-gnu/libhwloc.so.5(hwloc_topology_load+0x1ab) [0x7fd0b2950757]
[testmpi:05075] [ 7] /usr/lib/libopen-rte.so.4(orte_odls_base_open+0x7a7) [0x7fd0b34fb5f7]
[testmpi:05075] [ 8] /usr/lib/openmpi/lib/openmpi/mca_ess_hnp.so(+0x2f60) [0x7fd0b114df60]
[testmpi:05075] [ 9] /usr/lib/libopen-rte.so.4(orte_init+0x193) [0x7fd0b34d29f3]
[testmpi:05075] [10] /usr/lib/libopen-rte.so.4(orte_daemon+0x20c) [0x7fd0b34ed62c]
[testmpi:05075] [11] orted() [0x400848]
[testmpi:05075] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fd0b2194b45]
[testmpi:05075] [13] orted() [0x400899]
[testmpi:05075] *** End of error message ***
[testmpi:05074] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 343
[testmpi:05074] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 140
[testmpi:05074] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file runtime/orte_init.c at line 128
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node (-128) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128) instead of "Success" (0)
--------------------------------------------------------------------------
[testmpi:5074] *** An error occurred in MPI_Init
[testmpi:5074] *** on a NULL communicator
[testmpi:5074] *** Unknown error
[testmpi:5074] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason:     Before MPI_INIT completed
  Local host: testmpi
  PID:        5074
--------------------------------------------------------------------------

Later on I installed openmpi from source using these instructions: http://www.sysads.co.uk/2014/05/install-open-mpi-1-8-ubuntu-14-04-13-10/

and now if i run Ray using the "newly installed" mpiexec, I get this error which is same one as I got it in the first post.

root@test-openmpi:/home/root/.openmpi/bin# ./mpiexec --bind-to none --allow-run-as-root Ray      
[test-openmpi:02984] *** Process received signal ***
[test-openmpi:02984] Signal: Floating point exception (8)
[test-openmpi:02984] Signal code: Integer divide-by-zero (1)
[test-openmpi:02984] Failing at address: 0x7fae6930f3d8
[test-openmpi:02984] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7fae687658d0]
[test-openmpi:02984] [ 1] /home/root/.openmpi/lib/libopen-pal.so.6(+0x8c3d8)[0x7fae6930f3d8]
[test-openmpi:02984] [ 2] /home/root/.openmpi/lib/libopen-pal.so.6(+0x8cfaa)[0x7fae6930ffaa]
[test-openmpi:02984] [ 3] /home/root/.openmpi/lib/libopen-pal.so.6(+0x8d0a1)[0x7fae693100a1]
[test-openmpi:02984] [ 4] /home/root/.openmpi/lib/libopen-pal.so.6(opal_hwloc172_hwloc_topology_load+0x1ab)[0x7fae6931db2b]
[test-openmpi:02984] [ 5] /home/root/.openmpi/lib/libopen-pal.so.6(opal_hwloc_base_get_topology+0xbe)[0x7fae692fa4ce]
[test-openmpi:02984] [ 6] /home/root/.openmpi/lib/openmpi/mca_ess_hnp.so(+0x3c17)[0x7fae67ba4c17]
[test-openmpi:02984] [ 7] /home/root/.openmpi/lib/libopen-rte.so.7(orte_init+0x166)[0x7fae69572f76]
[test-openmpi:02984] [ 8] ./mpiexec(orterun+0x7d9)[0x4044c9]
[test-openmpi:02984] [ 9] ./mpiexec(main+0x20)[0x403926]
[test-openmpi:02984] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fae683ceb45]
[test-openmpi:02984] [11] ./mpiexec[0x403839]
[test-openmpi:02984] *** End of error message ***
Floating point exception

Thanks for teaching me new things.

Nebisjames commented 4 years ago

[2020-05-05 19:43:47] Error: Floating-point exception [2020-05-05 19:43:47] Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t, void)> in /home/imo/news-translit-nmt-master/tools/marian-dev/src/common/logging.cpp:132

[CALL STACK] [0x766cac]
[0x7670e9]
[0x7f578eb94390] + 0x11390 [0x962ee6] marian::ReduceNodeOp:: ReduceNodeOp (IntrusivePtr<marian::Chainable<IntrusivePtr>>, int, marian::ReduceNodeOpCode) + 0x6e6 [0x96369a] IntrusivePtr<marian::Chainable<IntrusivePtr>> marian:: Expression <marian::ReduceNodeOp,IntrusivePtr<marian::Chainable<IntrusivePtr>>&,int&,marian::ReduceNodeOpCode>(IntrusivePtr<marian::Chainable<IntrusivePtr>>&, int&, marian::ReduceNodeOpCode&&) + 0x5a [0x8c02f3] marian:: sum (IntrusivePtr<marian::Chainable<IntrusivePtr>>, int) + 0x63 [0x8c9344] marian:: weighted_average (IntrusivePtr<marian::Chainable<IntrusivePtr>>, IntrusivePtr<marian::Chainable<IntrusivePtr>>, int) + 0xa4 [0xa7c603] marian::DecoderS2S:: startState (std::shared_ptr, std::shared_ptr, std::vector<std::shared_ptr,std::allocator<std::shared_ptr>>&) + 0xf3 [0xaa1be8] marian::EncoderDecoder:: startState (std::shared_ptr, std::shared_ptr) + 0x398 [0xaa34fb] marian::EncoderDecoder:: stepAll (std::shared_ptr, std::shared_ptr, bool) + 0xcb [0xa91482] marian::models::EncoderDecoderCECost:: apply (std::shared_ptr, std::shared_ptr, std::shared_ptr, bool) + 0x102 [0x9fcde7] marian::models::Trainer:: build (std::shared_ptr, std::shared_ptr, bool) + 0xa7 [0xb01f98] marian::GraphGroup:: collectStats (std::shared_ptr, std::shared_ptr, std::vector<std::shared_ptr,std::allocator<std::shared_ptr>> const&, double) + 0x1678 [0xaf8e9c] marian::SyncGraphGroup:: collectStats (std::vector<std::shared_ptr,std::allocator<std::shared_ptr>> const&) + 0x14c [0x722e25] marian::Train:: run () + 0x2a5 [0x660ae7] mainTrainer (int, char**) + 0x447 [0x62b72a] main + 0x8a [0x7f578dd30830] __libc_start_main + 0xf0 [0x65e5d9] _start + 0x29

How do i solve this problem

sebhtml commented 4 years ago

@Nebisjames Run your software inside a debugger and if it is a good debugger, it should break when SIGFPE occurs.