Closed eediaz1987 closed 8 years ago
I'm afraid I need a little more info than that - at least what OMPI version you are using?
i'm using OMPI 1.8.4-gcc 4.9.2, i instaled using easybuild
FWIW, I've seen this kind of message when I inadvertently used Open MPI version X on one machine and Open MPI version Y on another.
I've also seen it when I compiled my MPI app with Open MPI version X and then used mpirun
from Open MPI version Y.
Can you confirm that you are using a single, consistent version/installation of Open MPI?
@eediaz1987
this error can only occur if ipv6
is explicitly enabled at configure time with --enable-ipv6
, and this is does not seem to be the case with easybuild https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/o/OpenMPI/OpenMPI-1.8.4-GCC-4.9.2.eb
that being said, i am even surprise you ran into this
at first glance, parse_uri
should be given one host name, and not a comma separated list of hostnames
as advised by @jsquyres, please make sure you always use the same openmpi version (at compile time, at run time and accross all your hosts)
as a side note, you can explicitly list the network used by oob/tcp
for example
mpirun --mca oob_tcp_if_include 10.30.1.0/24 ...
for this errror [hn:27581] oob_tcp_parse_uri: Could not resolve 10.30.1.225,192.168.1.254. [Error: Name or service not known] i run the command host 10.30.1.225 and the answer was host 10.30.1.225 225.1.30.10.in-addr.arpa domain name pointer hn.hpc.uo.edu.cu. and i obtenin the same answer for the other ip
2016-03-30 19:36 GMT-05:00 Gilles Gouaillardet notifications@github.com:
@eediaz1987 https://github.com/eediaz1987 this error can only occur if ipv6 is explicitly enabled at configure time with --enable-ipv6, and this is does not seem to be the case with easybuild https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/o/OpenMPI/OpenMPI-1.8.4-GCC-4.9.2.eb that being said, i am even surprise you ran into this at first glance, parse_uri should be given one host name, and not a comma separated list of hostnames as advised by @jsquyres https://github.com/jsquyres, please make sure you always use the same openmpi version (at compile time, at run time and accross all your hosts)
as a side note, you can explicitly list the network used by oob/tcp for example mpirun --mca oob_tcp_if_include 10.30.1.0/24 ...
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/1510#issuecomment-203696841
yes i'm using only OMPI 1.8.4-gcc 4.9.2
Can you send the full output from https://www.open-mpi.org/community/help/ ?
can you also run mpirun --mca oob_base_verbose 100 -np 1 bin/mpi_hello and post the output
i run that line and this is the output UOeediaz@hn:~$ mpirun --mca oob_base_verbose 100 -np 1 bin/mpi_hello [hn:09412] mca: base: components_register: registering oob components [hn:09412] mca: base: components_register: found loaded component tcp [hn:09412] mca: base: components_register: component tcp register function succe ssful [hn:09412] mca: base: components_open: opening oob components [hn:09412] mca: base: components_open: found loaded component tcp [hn:09412] mca: base: components_open: component tcp open function successful [hn:09412] mca:oob:select: checking available component tcp [hn:09412] mca:oob:select: Querying component [tcp] [hn:09412] oob:tcp: component_available called [hn:09412] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [hn:09412] [[55417,0],0] oob:tcp:init rejecting loopback interface lo [hn:09412] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [hn:09412] [[55417,0],0] oob:tcp:init adding 10.30.1.225 to our list of V4 conne ctions [hn:09412] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4 [hn:09412] [[55417,0],0] oob:tcp:init adding 192.168.1.254 to our list of V4 con nections [hn:09412] [[55417,0],0] TCP STARTUP [hn:09412] [[55417,0],0] attempting to bind to IPv4 port 0 [hn:09412] [[55417,0],0] assigned IPv4 port 40524 [hn:09412] mca:oob:select: Adding component to end [hn:09412] mca:oob:select: Found 1 active transports [hn:09412] [[55417,0],0]: set_addr to uri 3631808512.0;tcp://10.30.1.225,192.168 .1.254:40524 [hn:09412] [[55417,0],0]:set_addr peer [[55417,0],0] is me [hn:09414] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ../../../../../ ../orte/mca/ess/env/ess_env_module.c at line 358 [hn:09414] mca: base: components_open: Looking for oob components [hn:09414] mca: base: components_open: opening oob components [hn:09414] mca: base: components_open: found loaded component tcp [hn:09414] mca: base: components_open: component tcp has no register function [hn:09414] mca: base: components_open: component tcp open function successful [hn:09414] oob_tcp_parse_uri: Could not resolve 10.30.1.225,192.168.1.254. [Error: Name or service not known] [hn:09414] [[INVALID],INVALID] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../../orte/mca/rml/oob/rml_oob_send.c at line 146 [hn:09414] [[INVALID],INVALID] attempted to send to [[55417,0],0] [hn:09414] [[INVALID],INVALID] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../orte/mca/routed/base/routed_base_register_sync.c at line 92 [hn:09414] [[INVALID],INVALID] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../../orte/mca/routed/binomial/routed_binomial.c at line 891 [hn:09414] [[INVALID],INVALID] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../orte/mca/ess/base/ess_base_std_app.c at line 151
2016-04-03 17:29 GMT-05:00 Gilles Gouaillardet notifications@github.com:
can you also run mpirun --mca oob_base_verbose 100 -np 1 bin/mpi_hello and post the output
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/1510#issuecomment-205068037
line 358 from ess_env_module.c does not seem to come from version 1.8.4 !
can you please double or triple check you are using the correct library
mpirun --version
ldd bin/mpi_hello
can you also post the output of
ompi_info --all
and the config.status easy build generated ?
@eediaz1987 I checked the sources, and as far as I am concerned, these traces can only make sense with mpirun from v1.8 series or later, whereas your app is using v1.6 series libraries
mpirun --version ldd bin/mpi_hello for this command the output is mpirun (Open MPI) 1.8.4 Report bugs to http://www.open-mpi.org/community/help/
and in the atachment is the output of the other command
2016-04-05 6:39 GMT-05:00 Gilles Gouaillardet notifications@github.com:
@eediaz1987 https://github.com/eediaz1987 I checked the sources, and as far as I am concerned, these traces can only make sense with mpirun from v1.8 series or later, whereas your app is using v1.6 series libraries
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/1510#issuecomment-205765766
Package: Open MPI oau@hn Distribution
Open MPI: 1.8.4
Open MPI repo revision: v1.8.3-330-g0344f04
Open MPI release date: Dec 19, 2014
Open RTE: 1.8.4
Open RTE repo revision: v1.8.3-330-g0344f04
Open RTE release date: Dec 19, 2014
OPAL: 1.8.4
OPAL repo revision: v1.8.3-330-g0344f04
OPAL release date: Dec 19, 2014
MPI API: 3.0
Ident string: 1.8.4
Prefix: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2
Exec_prefix: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2
Bindir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/bin
Sbindir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/sbin
Libdir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/lib
Incdir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/include
Mandir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/share/man
Pkglibdir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/lib/openmpi
Libexecdir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/libexec
Datarootdir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/share
Datadir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/share
Sysconfdir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/etc
Sharedstatedir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/com
Localstatedir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/var
Infodir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/share/info
Pkgdatadir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/share/openmpi
Pkglibdir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/lib/openmpi
Pkgincludedir: /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/include/openmpi
Configured architecture: x86_64-unknown-linux-gnu
Configure host: hn
Configured by: oau
Configured on: Tue Mar 22 07:40:33 CDT 2016
Configure host: hn
Built by: oau
Built on: mar mar 22 07:50:12 CDT 2016
Built host: hn
C bindings: yes
C++ bindings: yes
Fort mpif.h: yes (all)
Fort use mpi: yes (full: ignore TKR)
Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: yes
Fort mpi_f08 compliance: The mpi_f08 module is available, but due to limitations in the gfortran compiler, does not support the following: array subsections, direct passthru (where possible) to underlying Open MPI's C functionality
Fort mpi_f08 subarrays: no
Java bindings: no
Wrapper compiler rpath: runpath
C compiler: gcc
C compiler absolute: /opt/librarieshpc/easybuild/software/GCC/4.9.2/bin/gcc
C compiler family name: GNU
C compiler version: 4.9.2
C char size: 1
C bool size: 1
C short size: 2
C int size: 4
C long size: 8
C float size: 4
C double size: 8
C pointer size: 8
C char align: 1
C bool align: 1
C int align: 4
C float align: 4
C double align: 8
C++ compiler: g++
C++ compiler absolute: /opt/librarieshpc/easybuild/software/GCC/4.9.2/bin/g++
Fort compiler: gfortran
Fort compiler abs: /opt/librarieshpc/easybuild/software/GCC/4.9.2/bin/gfortran
Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
Fort 08 assumed shape: yes
Fort optional args: yes
Fort INTERFACE: yes
Fort ISO_FORTRAN_ENV: yes
Fort STORAGE_SIZE: yes
Fort BIND(C) (all): yes
Fort ISO_C_BINDING: yes
Fort SUBROUTINE BIND(C): yes
Fort TYPE,BIND(C): yes
Fort T,BIND(C,name="a"): yes
Fort PRIVATE: yes
Fort PROTECTED: yes
Fort ABSTRACT: yes
Fort ASYNCHRONOUS: yes
Fort PROCEDURE: yes
Fort C_FUNLOC: yes
Fort f08 using wrappers: yes
Fort MPI_SIZEOF: yes
Fort integer size: 4
Fort logical size: 4
Fort logical value true: 1
Fort have integer1: yes
Fort have integer2: yes
Fort have integer4: yes
Fort have integer8: yes
Fort have integer16: no
Fort have real4: yes
Fort have real8: yes
Fort have real16: yes
Fort have complex8: yes
Fort have complex16: yes
Fort have complex32: yes
Fort integer1 size: 1
Fort integer2 size: 2
Fort integer4 size: 4
Fort integer8 size: 8
Fort integer16 size: -1
Fort real size: 4
Fort real4 size: 4
Fort real8 size: 8
Fort real16 size: 16
Fort dbl prec size: 8
Fort cplx size: 8
Fort dbl cplx size: 16
Fort cplx8 size: 8
Fort cplx16 size: 16
Fort cplx32 size: 32
Fort integer align: 4
Fort integer1 align: 1
Fort integer2 align: 2
Fort integer4 align: 4
Fort integer8 align: 8
Fort integer16 align: -1
Fort real align: 4
Fort real4 align: 4
Fort real8 align: 8
Fort real16 align: 16
Fort dbl prec align: 8
Fort cplx align: 4
Fort dbl cplx align: 8
Fort cplx8 align: 4
Fort cplx16 align: 8
Fort cplx32 align: 16
C profiling: yes
C++ profiling: yes
Fort mpif.h profiling: yes
Fort use mpi profiling: yes
Fort use mpi_f08 prof: yes
C++ exceptions: no
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)
Sparse Groups: no
Build CFLAGS: -DNDEBUG -O2 -march=native -finline-functions -fno-strict-aliasing
Build CXXFLAGS: -DNDEBUG -O2 -march=native -finline-functions
Build FCFLAGS: -O2 -march=native
Build LDFLAGS: -L/opt/librarieshpc/easybuild/software/GCC/4.9.2/lib64 -L/opt/librarieshpc/easybuild/software/GCC/4.9.2/lib -L/opt/librarieshpc/easybuild/software/hwloc/1.10.0-GCC-4.9.2/lib -L/opt/librarieshpc/easybuild/software/hwloc/1.10.0-GCC-4.9.2/lib
Build LIBS: -lrt -lutil -lm -lpthread -lhwloc
Wrapper extra CFLAGS:
Wrapper extra CXXFLAGS:
Wrapper extra FCFLAGS:
Wrapper extra LDFLAGS: -L/usr/local/lib -Wl,--rpath -Wl,/usr/local/lib -Wl,--rpath -Wl,/usr/local/lib -Wl,--rpath -Wl,/usr/local/lib -Wl,-rpath -Wl,/usr/local/lib -Wl,-rpath -Wl,@{libdir} -Wl,--enable-new-dtags
Wrapper extra LIBS: -ldl -lrt -ltorque -libverbs -lutil -lm -lpthread -lhwloc
Internal debug support: no
MPI interface warnings: yes
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: no
Heterogeneous support: no
mpirun default --prefix: yes
MPI I/O support: yes
MPI_WTIME support: gettimeofday
Symbol vis. support: yes
Host topology support: yes
MPI extensions:
FT Checkpoint support: no (checkpoint thread: no)
C/R Enabled Debugging: no
VampirTrace support: yes
MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
MPI_MAX_PORT_NAME: 1024
MPI_MAX_DATAREP_STRING: 128
MCA mca: parameter "mca_param_files" (current value: "/shared/home/UOeediaz/.openmpi/mca-params.conf:/opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/etc/openmpi-mca-params.conf", data source: default, level: 2 user/detail, type: string, deprecated, synonym of: mca_base_param_files)
Path for MCA configuration files containing variable values
MCA mca: parameter "mca_component_path" (current value: "/opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/lib/openmpi:/shared/home/UOeediaz/.openmpi/components", data source: default, level: 9 dev/all, type: string, deprecated, synonym of: mca_base_component_path)
Path where to look for Open MPI and ORTE components
MCA mca: parameter "mca_component_show_load_errors" (current value: "true", data source: default, level: 9 dev/all, type: bool, deprecated, synonym of: mca_base_component_show_load_errors)
Whether to show errors for components that failed to load or not
Valid values: 0: f|false|disabled, 1: t|true|enabled
MCA mca: parameter "mca_component_disable_dlopen" (current value: "false", data source: default, level: 9 dev/all, type: bool, deprecated, synonym of: mca_base_component_disable_dlopen)
Whether to attempt to disable opening dynamic components or not
Valid values: 0: f|false|disabled, 1: t|true|enabled
MCA mca: parameter "mca_verbose" (current value: "stderr", data source: default, level: 9 dev/all, type: string, deprecated, synonym of: mca_base_verbose)
Specifies where the default error output stream goes (this is separate from distinct help messages). Accepts a comma-delimited list of: stderr, stdout, syslog, syslogpri:<notice|info|debug>, syslogid:
@eediaz1987 could you please run
ldd bin/mpi_hello
and post the output ?
you can also try
`which mpirun` -np 1bin/mpi_hello
and see if you are luckier with that
this is the output for the first command
UOeediaz@hn:~$ ldd bin/mpi_hello linux-vdso.so.1 => (0x00007ffcd4b0c000) libmpi.so.0 => /usr/lib/libmpi.so.0 (0x00007fdd2f7fc000) libopen-rte.so.0 => /usr/lib/libopen-rte.so.0 (0x00007fdd2f5ae000) libopen-pal.so.0 => /usr/lib/libopen-pal.so.0 (0x00007fdd2f357000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdd2f153000) libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00007fdd2ef3b000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fdd2ed38000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fdd2eab6000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdd2e89a000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdd2e50f000) /lib64/ld-linux-x86-64.so.2 (0x00007fdd2faae000)
and this is the output for the second UOeediaz@hn:~$ which mpirun -np 1 bin/mpi_hello /opt/librarieshpc/easybuild/software/OpenMPI/1.8.4-GCC-4.9.2/bin/mpirun bin/mpi_hello
2016-04-07 9:35 GMT-05:00 Gilles Gouaillardet notifications@github.com:
@eediaz1987 https://github.com/eediaz1987 could you please run
ldd bin/mpi_hello
and post the output ?
you can also try
which mpirun
-np 1bin/mpi_helloand see if you are luckier with that
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/1510#issuecomment-206934150
@eediaz1987 If you look at the output, you can clearly see the problem - you built mpi_hello against a library in /usr, and mpirun is in /opt
You need to fix your path and rebuild mpi_hello
thanks, i will the mpi_hello and i will try again
2016-04-08 13:32 GMT-05:00 rhc54 notifications@github.com:
@eediaz1987 https://github.com/eediaz1987 If you look at the output, you can clearly see the problem - you built mpi_hello against a library in /usr, and mpirun is in /opt
You need to fix your path and rebuild mpi_hello
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/1510#issuecomment-207548929
@eediaz1987 I'm going to close this issue, since it looks like the root cause has been found. Feel free to reply / re-open the issue if the issue still isn't solved.
i have a error usin OpenMpi, here atach a log fragment
EDIT: Used github triple-single-tick to mark the above section as verbatim.