radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

MPIEXEC launch method fails to detect supported arguments #3061

Closed AymenFJA closed 11 months ago

AymenFJA commented 11 months ago

This behavior was noticed in one of the failed runs in our tests here: https://github.com/radical-cybertools/radical.pilot/actions/runs/6473708269/job/17577018703?pr=3044.

The produced error:

Usage: ./mpiexec [global opts] [local opts for exec1] [exec1] [exec1 args] : [local opts for exec2] [exec2] [exec2 args] : ...

Global options (passed to all executables):

  Global environment options:
    -genv {name} {value}             environment variable name and value
    -genvlist {env1,env2,...}        environment variable list to pass
    -genvnone                        do not pass any environment variables
    -genvall                         pass all environment variables not managed
etc.....................

RP failed to detect the correct argument for a specific flavor of mpiexec (current mpiexec version):

HYDRA build details:
    Version:                                 3.3.2
    Release Date:                            Tue Nov 12 21:23:16 CST 2019
    CC:                              gcc   -Wl,-Bsymbolic-functions -Wl,-z,relro
    CXX:                             g++   -Wl,-Bsymbolic-functions -Wl,-z,relro
    F77:                             f77  -Wl,-Bsymbolic-functions -Wl,-z,relro
    F90:                             f95  -Wl,-Bsymbolic-functions -Wl,-z,relro
    Configure options:                       '--disable-option-checking' '--prefix=/usr' '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--with-libfabric' '--enable-shared' '--enable-fortran=all' '--disable-rpath' '--disable-wrapper-rpath' '--sysconfdir=/etc/mpich' '--libdir=/usr/lib/x86_64-linux-gnu' '--includedir=/usr/include/x86_64-linux-gnu/mpich' '--docdir=/usr/share/doc/mpich' 'CPPFLAGS= -Wdate-time -D_FORTIFY_SOURCE=2 -I/build/mpich-VeuB8Z/mpich-3.3.2/src/mpl/include -I/build/mpich-VeuB8Z/mpich-3.3.2/src/mpl/include -I/build/mpich-VeuB8Z/mpich-3.3.2/src/openpa/src -I/build/mpich-VeuB8Z/mpich-3.3.2/src/openpa/src -D_REENTRANT -I/build/mpich-VeuB8Z/mpich-3.3.2/src/mpi/romio/include' 'CFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -O2' 'CXXFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -O2' 'FFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -O2' 'FCFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -cpp -O2' 'BASH_SHELL=/bin/bash' 'build_alias=x86_64-linux-gnu' 'MPICHLIB_CFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'MPICHLIB_CXXFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_FFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong' 'MPICHLIB_FCFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-VeuB8Z/mpich-3.3.2=. -fstack-protector-strong -cpp' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro' 'FC=f95' 'F77=f77' 'MPILIBNAME=mpich' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'LIBS=' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:
    Demux engines available:                 poll select

The version above does not support -rf argument as shown below:

Other global options:
    -f {name}                        file containing the host names
    -hosts {host list}               comma separated host list
    -wdir {dirname}                  working directory to use
    -configfile {name}               config file containing MPMD launch options

The mpiexec command RP generated is with -rf:

/usr/bin/mpiexec -np 10 -H localhost -rf /home/runner/radical.pilot.sandbox/rp.session.fv-az482-586.runner.019639.0000/pilot.0000/raptor.0000.0000//raptor.0000.0000.rf $RP_TASK_SANDBOX/raptor.0000.0000.exec.sh

The responsible line for this error: https://github.com/radical-cybertools/radical.pilot/blob/693319a19757ef299ce804c588405d4bbc27eef1/src/radical/pilot/agent/launch_method/mpiexec.py#L79

AymenFJA commented 11 months ago

This is fixed https://github.com/radical-cybertools/radical.pilot/pull/3064