pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
560 stars 279 forks source link

Virtual MPI cluster don't work #3839

Closed andxeg closed 5 years ago

andxeg commented 5 years ago

Hello everyone.

I have a problem with creation of virtual MPI cluster. I have 2 physical servers, on each I start one virtual machine using virt-install. Between servers I made VXLAN tunnel. Virtual machines can communicate between each other: ssh, ping and tcp (check by simple python client/server scripts).

But mpirun -np N -hosts master,slave ./test_mpi_prog didn't work - infinite freeze. I checked ebtables and iptables, removed rules from INPUT, OUTPUT and FORWARD.

I add master, slave1 to /etc/hosts of each virtual machine, made ssh connection by key (without password). All work fine, because I check it on one server: start two virtual machines which were added to virbr0 bridge, assigned IP addresses. This one-server cluster work. The command mpirun -np N -hosts master,slave ./test_mpi_prog was finished. Test program test_mpi_prog print proc name (using MPI_Get_processor_name function).

Virtual machines has the same OS: ubuntu 16.04.

I configure virtual machine by instruction from -> https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/

MPICH version (mpirun --version).

HYDRA build details:
    Version:                                 3.2
    Release Date:                            Wed Nov 11 22:06:48 CST 2015
    CC:                              gcc   -Wl,-Bsymbolic-functions -Wl,-z,relro 
    CXX:                             g++   -Wl,-Bsymbolic-functions -Wl,-z,relro 
    F77:                             gfortran  -Wl,-Bsymbolic-functions -Wl,-z,relro 
    F90:                             gfortran  -Wl,-Bsymbolic-functions -Wl,-z,relro 
    Configure options:                       '--disable-option-checking' '--prefix=/usr' '--build=x86_64-linux-
gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--
sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' 
'--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-
tracking' '--enable-shared' '--enable-fortran=all' '--disable-rpath' '--disable-wrapper-rpath' '--
sysconfdir=/etc/mpich' '--libdir=/usr/lib/x86_64-linux-gnu' '--includedir=/usr/include/mpich' '--
docdir=/usr/share/doc/mpich' '--with-hwloc-prefix=system' '--enable-checkpointing' '--with-hydra-
ckpointlib=blcr' 'CPPFLAGS= -Wdate-time -D_FORTIFY_SOURCE=2 -I/build/mpich-
jQtQ8p/mpich-3.2/src/mpl/include -I/build/mpich-jQtQ8p/mpich-3.2/src/mpl/include -I/build/mpich-
jQtQ8p/mpich-3.2/src/openpa/src -I/build/mpich-jQtQ8p/mpich-3.2/src/openpa/src -
D_REENTRANT -I/build/mpich-jQtQ8p/mpich-3.2/src/mpi/romio/include' 'CFLAGS= -g -O2 -
fstack-protector-strong -Wformat -Werror=format-security -O2' 'CXXFLAGS= -g -O2 -fstack-
protector-strong -Wformat -Werror=format-security -O2' 'FFLAGS= -g -O2 -fstack-protector-
strong -O2' 'FCFLAGS= -g -O2 -fstack-protector-strong -O2' 'build_alias=x86_64-linux-gnu' 
'MPICHLIB_CFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security' 
'MPICHLIB_CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'MPICHLIB_CXXFLAGS=-g -
O2 -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_FFLAGS=-g -O2 -
fstack-protector-strong' 'MPICHLIB_FCFLAGS=-g -O2 -fstack-protector-strong' 'LDFLAGS=-Wl,-
Bsymbolic-functions -Wl,-z,relro' 'FC=gfortran' 'F77=gfortran' 'MPILIBNAME=mpich' '--cache-
file=/dev/null' '--srcdir=.' 'CC=gcc' 'LIBS=-lpthread '
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:       blcr
    Demux engines available:                 poll select
raffenet commented 5 years ago

Can you try running a non-MPI program and see what happens? That should determine if mpiexec is able to reach and launch binaries on all machines. For example:

mpiexec -np N -hosts master,slave hostname

andxeg commented 5 years ago

@raffenet Thank you for fast reply. I checked you command. Indeed mpiexec write to stdout

master
master

and then hang for indefinite time.

I also check this command

# strace mpiexec.mpich -np 4 -hosts master,slave hostname

It returned

fcntl(1, F_SETFL, O_RDWR|O_APPEND|O_NONBLOCK|O_LARGEFILE) = 0
read(8, "", 65536)                      = 0
close(8)                                = 0
poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=0, events=POLLIN}], 5, -1) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12442, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
restart_syscall(<... resuming interrupted poll ...>) = 1
read(13, "Permission denied, please try ag"..., 65536) = 38
write(2, "Permission denied, please try ag"..., 38Permission denied, please try again.
) = 38
poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=0, events=POLLIN}], 5, -1) = 1 ([{fd=13, revents=POLLIN}])
read(13, "Permission denied, please try ag"..., 65536) = 38
write(2, "Permission denied, please try ag"..., 38Permission denied, please try again.
) = 38
poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=0, events=POLLIN}], 5, -1) = 1 ([{fd=13, revents=POLLIN}])
read(13, "Permission denied (publickey,pas"..., 65536) = 41
write(2, "Permission denied (publickey,pas"..., 41Permission denied (publickey,password).
) = 41
poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=0, events=POLLIN}], 5, -1) = 1 ([{fd=13, revents=POLLHUP}])
read(13, "", 65536)                     = 0
close(13)                               = 0
poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=11, events=POLLIN}, {fd=0, events=POLLIN}], 4, -1) = 1 ([{fd=11, revents=POLLHUP}])
read(11, "", 65536)                     = 0
close(11)

I can ssh from one VM to another without password.

raffenet commented 5 years ago

Are both hostnames set to master? That might cause issues during MPI_Init.

andxeg commented 5 years ago

One VM has hostname 'master', another 'slave'. I also add their IP and hostnames to /etc/hosts. In case above I run 4 MPI processes, therefore stdout has 2 lines with master, then execution was freezed, because there is some problem in slave.

raffenet commented 5 years ago

You may need to allow additional TCP communication in the firewall rules. From the sound of your experiments, mpiexec is unable to establish a connection to the agent it launched on slave via SSH. For reference, there is an environment variable you can set to specify the allowed port range. See https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Environment_Settings

andxeg commented 5 years ago

I can ssh from master to slave and from slave to master without password:

mpiuser@master$ ssh slave
mpiuser@slave$ ssh master 

In iptables INPUT, OUTPUT and FORWARD chains have default ACCEPT policy.

I set env variable MPIEXEC_PORT_RANGE to 10000:10010 and start mpiexec on master:

mpiuser@master$ mpiexec.mpich -np 2 -hosts master,slave hostname

Output:

master

and execution was freezed because waiting slave node.

On slave node output of command mpiuser@slave$ ps -ef | grep hydra:

mpiuser  15439 15438  0 07:04 ?        00:00:00 /usr/bin/hydra_pmi_proxy --control-port master:10000 --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1

I check port state (open/close):

mpiuser@slave1$ nc -zv master 10000
Connection to master 10000 port [tcp/webmin] succeeded!
raffenet commented 5 years ago
mpiuser@master$ mpiexec.mpich -np 2 -hosts master,slave hostname

Can you add -v to this command and paste the output?

andxeg commented 5 years ago

@raffenet Thank You very much! I realized the reason of problem. Between two my servers I created VXLAN tunnel. Packets from one server to another go through the cisco router and it dropped them. I didn't consider it. When I had set MTU to 1450 instead 1500 on interfaces in virtual machines, mpirun command finished perfectly.