Closed ontheklaud closed 6 years ago
Thanks for reporting the issue.
It looks like your compiler seg faulted while building the file ompi/mca/io/romio314/romio/mpi-io/delete.c
.
This is not a problem with Open MPI per se, but rather a problem with your compiler.
Interestingly enough, this file did not change between the Open MPI 3.1.2 and 3.1.3 releases:
$ wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.3.tar.bz2
...
$ tar xf openmpi-3.1.3.tar.bz2
$ wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.3.tar.bz2
...
$ tar xf openmpi-3.1.2.tar.bz2
$ diff \
openmpi-3.1.2/ompi/mca/io/romio314/romio/mpi-io/delete.c \
openmpi-3.1.3/ompi/mca/io/romio314/romio/mpi-io/delete.c
$
That being said, it's possible/likely that some other header files that delete.c
uses changed between 3.1.2 and 3.1.3. I.e., I'm sure something changed to make your compiler abort when building 3.1.3 and not when building 3.1.2.
You might want to investigate why your compiler seg faulted.
Thanks for such a quick response. Here's some progess for building v3.1.3.
reverted gcc into native gcc (centos7-native)
$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
extracted from fresh source; here's an output of configure.
./configure --prefix=${HOME}/opt/openmpi-3.1.3
(...)
Open MPI configuration:
-----------------------
Version: 3.1.3
Build MPI C bindings: yes
Build MPI C++ bindings (deprecated): no
Build MPI Fortran bindings: mpif.h, use mpi
MPI Build Java bindings (experimental): no
Build Open SHMEM support: yes
Debug build: no
Platform file: (none)
CUDA support: no PMIx support: internal
Cisco usNIC: no Cray uGNI (Gemini/Aries): no Intel Omnipath (PSM2): no Intel SCIF: no Intel TrueScale (PSM): no Mellanox MXM: no Open UCX: no OpenFabrics Libfabric: no OpenFabrics Verbs: no Portals4: no Shared memory/copy in+copy out: yes Shared memory/Linux CMA: yes Shared memory/Linux KNEM: no Shared memory/XPMEM: no TCP: yes
Cray Alps: no Grid Engine: no LSF: no Moab: no Slurm: yes ssh/rsh: yes Torque: no
Generic Unix FS: yes Lustre: no PVFS2/OrangeFS: no
* Milestone: Strangely, I managed v3.1.3 source to build successfully, **only with single thread (-j1)**.
**[make -j1] Succeess**
(...)
CC monitoring_test.o
CCLD monitoring_test
CC test_pvar_access.o
CCLD test_pvar_access
CC test_overhead.o
CCLD test_overhead
CC check_monitoring.o
CCLD check_monitoring
CC example_reduce_count.o
CCLD example_reduce_count
make[2]: Leaving directory /home/xo/pyenv-ngraph/openmpi-3.1.3/test/monitoring' make[2]: Entering directory
/home/xo/pyenv-ngraph/openmpi-3.1.3/test'
make[2]: Nothing to be done for all-am'. make[2]: Leaving directory
/home/xo/pyenv-ngraph/openmpi-3.1.3/test'
make[1]: Leaving directory /home/xo/pyenv-ngraph/openmpi-3.1.3/test' make[1]: Entering directory
/home/xo/pyenv-ngraph/openmpi-3.1.3'
make[1]: Nothing to be done for all-am'. make[1]: Leaving directory
/home/xo/pyenv-ngraph/openmpi-3.1.3'
$
**[make -j16] Failed: first attempt from discrete source extract**
(...)
CC pwin_lock_all_f.lo
CC pwin_post_f.lo
CC pwin_set_attr_f.lo
CC pwin_set_errhandler_f.lo
CC pwin_set_info_f.lo
CC pwin_set_name_f.lo
CC pwin_shared_query_f.lo
CC pwin_start_f.lo
CC pwin_sync_f.lo
CC pwin_test_f.lo
/bin/sh: line 2: 4100 Segmentation fault /bin/sh ../../../../../libtool --silent --tag=CC --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../../../../opal/include -I../../../../../ompi/include -I../../../../../oshmem/include -I../../../../../opal/mca/hwloc/hwloc1117/hwloc/include/private/autogen -I../../../../../opal/mca/hwloc/hwloc1117/hwloc/include/hwloc/autogen -I../../../../../ompi/mpiext/cuda/c -DOMPI_BUILD_MPI_PROFILING=1 -DOMPI_COMPILING_FORTRAN_WRAPPERS=1 -I../../../../.. -I../../../../../orte/include -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/event/libevent2022/libevent -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/event/libevent2022/libevent/include -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/hwloc/hwloc1117/hwloc/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -mcx16 -pthread -MT pwin_start_f.lo -MD -MP -MF $depbase.Tpo -c -o pwin_start_f.lo pwin_start_f.c
make[3]: [pwin_start_f.lo] Error 139
make[3]: Waiting for unfinished jobs....
/bin/sh: line 2: 4085 Segmentation fault /bin/sh ../../../../../libtool --silent --tag=CC --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../../../../opal/include -I../../../../../ompi/include -I../../../../../oshmem/include -I../../../../../opal/mca/hwloc/hwloc1117/hwloc/include/private/autogen -I../../../../../opal/mca/hwloc/hwloc1117/hwloc/include/hwloc/autogen -I../../../../../ompi/mpiext/cuda/c -DOMPI_BUILD_MPI_PROFILING=1 -DOMPI_COMPILING_FORTRAN_WRAPPERS=1 -I../../../../.. -I../../../../../orte/include -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/event/libevent2022/libevent -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/event/libevent2022/libevent/include -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/hwloc/hwloc1117/hwloc/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -mcx16 -pthread -MT pwin_shared_query_f.lo -MD -MP -MF $depbase.Tpo -c -o pwin_shared_query_f.lo pwin_shared_query_f.c
make[3]: [pwin_shared_query_f.lo] Error 139
make[3]: Leaving directory `/home/xo/pyenv-ngraph/openmpi-3.1.3/ompi/mpi/fortran/mpif-h/profile'
make[2]: [all-recursive] Error 1
make[2]: Leaving directory /home/xo/pyenv-ngraph/openmpi-3.1.3/ompi/mpi/fortran/mpif-h' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory
/home/xo/pyenv-ngraph/openmpi-3.1.3/ompi'
**[make -j16] Failed: second attempt from discrete source extract**
(...)
CC ptype_get_contents_f.lo
CC ptype_get_envelope_f.lo
CC ptype_get_extent_f.lo
CC ptype_get_extent_x_f.lo
CC ptype_get_name_f.lo
CC ptype_get_true_extent_f.lo
CC ptype_get_true_extent_x_f.lo
CC ptype_hindexed_f.lo
CC ptype_hvector_f.lo
CC ptype_indexed_f.lo
CC ptype_lb_f.lo
CC ptype_match_size_f.lo
CC ptype_set_attr_f.lo
CC ptype_set_name_f.lo
CC ptype_size_f.lo
/bin/sh: line 2: 19489 Segmentation fault /bin/sh ../../../../../libtool --silent --tag=CC --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../../../../opal/include -I../../../../../ompi/include -I../../../../../oshmem/include -I../../../../../opal/mca/hwloc/hwloc1117/hwloc/include/private/autogen -I../../../../../opal/mca/hwloc/hwloc1117/hwloc/include/hwloc/autogen -I../../../../../ompi/mpiext/cuda/c -DOMPI_BUILD_MPI_PROFILING=1 -DOMPI_COMPILING_FORTRAN_WRAPPERS=1 -I../../../../.. -I../../../../../orte/include -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/event/libevent2022/libevent -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/event/libevent2022/libevent/include -I/home/xo/pyenv-ngraph/openmpi-3.1.3/opal/mca/hwloc/hwloc1117/hwloc/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -mcx16 -pthread -MT ptype_lb_f.lo -MD -MP -MF $depbase.Tpo -c -o ptype_lb_f.lo ptype_lb_f.c
make[3]: [ptype_lb_f.lo] Error 139
make[3]: Waiting for unfinished jobs....
make[3]: Leaving directory /home/xo/pyenv-ngraph/openmpi-3.1.3/ompi/mpi/fortran/mpif-h/profile' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory
/home/xo/pyenv-ngraph/openmpi-3.1.3/ompi/mpi/fortran/mpif-h'
make[1]: [all-recursive] Error 1
make[1]: Leaving directory `/home/xo/pyenv-ngraph/openmpi-3.1.3/ompi'
make: [all-recursive] Error 1
Here's my opinion.
- there is no issue with building v3.1.3 in my DevEnv, **only with single threaded make (-j1)**
- my previous gcc maybe not a problem
- **multi-threaded make (e.g. -j16)** may occur segfault while building the fresh source
Is there any procedure that I have to lookup?
Thanks!
There might be something else wrong with your OS installation -- e.g., do you have low memory and/or disk space? You might want to check places like /var/log/messages
to see if any relevant error messages from gcc appeared there (E.g., they got killed because the OS ran out of RAM or something).
There's not much Open MPI can do if the compiler seg faults. ☹️
I also agree with you 😿. I have to lookup my OS installation sometime. RAM was enough (64 GiB), disk was enough (Samsung 960 1TB, 100GB free space) Anyway, v3.1.2 was available to build with multi-threaded, and v3.1.3 was eventually built with single-thread.
If there is any issue 'highly' related to ompi, then I'll start new issue; now closing this issue. Again, thanks for your comment to lookup my workaround.
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
v3.1.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Please describe the system on which you are running
Compiler: gcc (GCC) 8.2.0 from source build
Details of the problem
[build procedure] 1> wget
2> tar xf openmpi-3.1.3.tar.gz
3> cd openmpi-3.1.3
4> ./configure (again, with no extra flags)
5> make -j16
6> Ta-da! with Segmentation Fault
[configure output]
[tailed error]