Closed Minyoung-sss closed 5 months ago
The error message suggests you add RDMAV_FORK_SAFE=1. Have you tried adding the following to your mpiexec line:
mpiexec -x RDMAV_FORK_SAFE=1 -np 12 ...
Additionally, your mpiexec output indicates you are using EFA via libfabric, but your config.log output indicates it will not be built with libfabric support. Are you sure you are running the mpi version you think you are?
Thank you for your kindly answer.
I will try this command agian.
mpiexec -x RDMAV_FORK_SAFE=1 -np 12 ...
but I have questions this command '-x RDMAV_FORK_SAFE=1' Is that mean related to RDMA environment and this is causative to ERROR SINGAL 11? I searched about this error in Google, so I found this error related to coumputer memory and defalut is '0' (not)
and I don't know my mpiexec output using EFA via libfabric before your answering. LOL I don't configure anything about EFA and libfabric. It is right that it will not be with libfabric support.
So, I check running openmpi version again and I confirm the MAKER site which I want to run using open MPI.
$ mpirun --version mpirun (Open MPI) 4.1.2 $ which mpirun /usr/local/bin/mpirun
MAKER program can be used any version open MPI or MPICH.
Do you think I should change my MPI version? or I should build with libfabric suppor?
If I need to re-install different MPI version, how can I remove completely MPI old version? or If I should build with libfavric support , how can I build support?
Thank you for helping rookie, who is lacking a lot
Regards.
In addtion to my unbuntu package openmpi version is 4.1.2
If I need to re-install different MPI version, remove them also?
I used reference this wepsite when I firstly installed open MPI. so I think this packages need to install MPI.
Why don't you try the workaround first?
Note you have to use mpirun
from the library that was used to build your application.
I try this command
mpiexec -x RDMAV_FORK_SAFE=1 -np 12 ...
However, same error came out....
(MAKER) kucmb@kucmb-System-Product-Name:~/maker$ mpiexec -x RDMAV_FORK_SAFE=1 -np 12 maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[kucmb-System-Product-Name:401470] *** Process received signal ***
[kucmb-System-Product-Name:401470] Signal: Segmentation fault (11)
[kucmb-System-Product-Name:401470] Signal code: Address not mapped (1)
[kucmb-System-Product-Name:401470] Failing at address: 0x5a4
[kucmb-System-Product-Name:401470] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f1e33a42520]
[kucmb-System-Product-Name:401470] [ 1] /home/kucmb/anaconda3/envs/MAKER/bin/../lib/perl5/5.32/core_perl/CORE/libperl.so(Perl_csighandler3+0x38)[0x7f1e33eff698]
[kucmb-System-Product-Name:401470] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f1e33a42520]
[kucmb-System-Product-Name:401470] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x4f)[0x7f1e33b18bcf]
[kucmb-System-Product-Name:401470] [ 4] /lib/x86_64-linux-gnu/libevent_core-2.1.so.7(+0x24309)[0x7f1e339d3309]
[kucmb-System-Product-Name:401470] [ 5] /lib/x86_64-linux-gnu/libevent_core-2.1.so.7(event_base_loop+0x2a1)[0x7f1e339ce921]
[kucmb-System-Product-Name:401470] [ 6] /lib/x86_64-linux-gnu/libopen-pal.so.40(+0x2d646)[0x7f1e33d7a646]
[kucmb-System-Product-Name:401470] [ 7] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f1e33a94ac3]
[kucmb-System-Product-Name:401470] [ 8] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f1e33b26850]
[kucmb-System-Product-Name:401470] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
--------------------------------------------------------------------------
mpiexec noticed that process rank 11 with PID 0 on node kucmb-System-Product-Name exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
But I tried previous command again by mistake
mpiexec -np 12 maker...
it is run.....
I don't know why this command run. I haven't changed anything. I'll see if things going on right.... I feel like this run maybe new problem come up......
Thank you so much.
Hello. everyone.
My computer have executed this command for 3 days well, but it suddenly stopped at today morning
#-------------------------------#
SIGTERM thread
SIGTERM received
deleted:130 hits
collecting blastn reports
SIGTERM thread
[kucmb-System-Product-Name:402259] *** Process received signal ***
[kucmb-System-Product-Name:402259] Signal: Segmentation fault (11)
[kucmb-System-Product-Name:402259] Signal code: Address not mapped (1)
[kucmb-System-Product-Name:402259] Failing at address: 0x5a4
[kucmb-System-Product-Name:402259] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f7159a42520]
[kucmb-System-Product-Name:402259] [ 1] /home/kucmb/anaconda3/envs/MAKER/bin/../lib/perl5/5.32/core_perl/CORE/libperl.so(Perl_csighandler3+0x38)[0x7f7159eff698]
[kucmb-System-Product-Name:402259] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f7159a42520]
[kucmb-System-Product-Name:402259] [ 3] /home/kucmb/anaconda3/envs/MAKER/bin/../lib/perl5/5.32/core_perl/CORE/libperl.so(Perl_csighandler+0x0)[0x7f7159eff710]
[kucmb-System-Product-Name:402259] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f7159a42520]
[kucmb-System-Product-Name:402259] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x4f)[0x7f7159b18bcf]
[kucmb-System-Product-Name:402259] [ 6] /lib/x86_64-linux-gnu/libevent_core-2.1.so.7(+0x24309)[0x7f7159c76309]
[kucmb-System-Product-Name:402259] [ 7] /lib/x86_64-linux-gnu/libevent_core-2.1.so.7(event_base_loop+0x2a1)[0x7f7159c71921]
[kucmb-System-Product-Name:402259] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(+0x2d646)[0x7f715a1fc646]
[kucmb-System-Product-Name:402259] [ 9] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f7159a94ac3]
[kucmb-System-Product-Name:402259] [10] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f7159b26850]
[kucmb-System-Product-Name:402259] *** End of error message ***
running blast search.
#-------------------------------#
deleted:90 hits
SIGTERM thread
SIGTERM received
--------------------------------------------------------------------------
mpiexec noticed that process rank 8 with PID 0 on node kucmb-System-Product-Name exited on signal 11 (Segmentation fault).
same error again.. Is it a disk capacity problem? In the morning, I got a notification that the capacity was insufficient.
Please give me any help
Thank you
There is not enough information here to help debug the problem. I suspect you are still mixing installation and runtime versions.
I suggest you do the following:
./configure
:
--enable-debug
(for better backtrace)--prefix=$HOME/ompi_test
(or whichever path you decided on)--enable-mpirun-prefix-by-default
(so runtime uses the same libraries)PATH=$HOME/ompi_test/bin:$PATH" and
LD_LIBRARY_PATH=$HOME/ompi_test/lib:$LD_LIBRARY_PATH` in .bashrc. Logout, log back in to apply.It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it.
Per the above comment, it has been a month with no reply on this issue. It looks like this issue has been abandoned.
I'm going to close this issue. If I'm wrong and this issue is not abandoned, please feel free to re-open it. Thank you!
Hello
I installed openMPI version 4.1.2. and I execute MAKER ver 3.1.2. but it stops immediately with this error (and I executed anaconda3 env name of 'MAKER')
In addition to, when I used this command '--mca btl ^openlib' , this error came out
What mean? I can't find this error what kind of and causation.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v4.1.2
I already executed mpirun MAKER using MPI v4.1.6. But running stop immediatly with same error. So I checked already installed version of difference MPI in my computer. I found ubuntu package 'openmpi-bin' and 'openmpi-common' version 4.1.2. I think this is a causation and I changed open MPI downgraded version 4.1.2
Is that right?? I am not good at knowing ubuntu and MPI because I have started studying bioinformatics one month ago.
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
https://chat.stackoverflow.com/rooms/153365/discussion-between-imworsethanyou-and-gilles-gouaillardet
Details of the problem
I don't know why same error appear with running MPI stop Please help me.
Best Regards
Thank you for reading