hpcugent / vsc-mympirun

mympirun is a tool to facilitate running MPI programs on an HPC cluster
GNU General Public License v2.0
6 stars 9 forks source link

overridepin does not seem to work #190

Open hajgato opened 2 years ago

hajgato commented 2 years ago

User tried:

export OMP_NUM_THREADS=1
mympirun --hybrid=32 --overridepin=spread --pinmpi cp2k.popt -i AIMD.inp -o AIMD.out

Error:

022-03-24 09:51:49,091 ERROR      RunAsyncMPI     MainThread  _post_exitcode: problem occured with cmd ['mpirun', '-machinefile', '/dodrio/scratch/users/vsc10507/.mympirun_0mafrq/1097369_20220324_095146/nodes', '--mca', 'pml', 'ucx', '--mca', 'btl', '^uct', '--mca', 'orte_keep_fqdn_hostnames', '1', '--mca', 'MKL_NUM_THREADS', '1', '--mca', 'MODULEPATH', '/readonly/dodrio/apps/RHEL8/zen2-ib/modules/all:/etc/modulefiles/vsc', '--mca', 'LOADEDMODULES', 'cluster/dodrio/cpu_rome:GCCcore/10.3.0:zlib/1.2.11-GCCcore-10.3.0:binutils/2.36.1-GCCcore-10.3.0:GCC/10.3.0:numactl/2.0.14-GCCcore-10.3.0:XZ/5.2.5-GCCcore-10.3.0:libxml2/2.9.10-GCCcore-10.3.0:libpciaccess/0.16-GCCcore-10.3.0:hwloc/2.4.1-GCCcore-10.3.0:OpenSSL/1.1:libevent/2.1.12-GCCcore-10.3.0:UCX/1.10.0-GCCcore-10.3.0:libfabric/1.12.1-GCCcore-10.3.0:PMIx/3.2.3-GCCcore-10.3.0:OpenMPI/4.1.1-GCC-10.3.0:OpenBLAS/0.3.15-GCC-10.3.0:FlexiBLAS/3.0.4-GCC-10.3.0:gompi/2021a:FFTW/3.3.9-gompi-2021a:ScaLAPACK/2.1.0-gompi-2021a-fb:foss/2021a:Libint/2.6.0-GCC-10.3.0-lmax-6-cp2k:libxc/5.1.5-GCC-10.3.0:libxsmm/1.16.2-GCC-10.3.0:GSL/2.7-GCC-10.3.0:bzip2/1.0.8-GCCcore-10.3.0:ncurses/6.2-GCCcore-10.3.0:libreadline/8.1-GCCcore-10.3.0:Tcl/8.6.11-GCCcore-10.3.0:SQLite/3.35.4-GCCcore-10.3.0:GMP/6.2.1-GCCcore-10.3.0:libffi/3.3-GCCcore-10.3.0:Python/3.9.5-GCCcore-10.3.0:pybind11/2.6.2-GCCcore-10.3.0:SciPy-bundle/2021.05-foss-2021a:ICU/69.1-GCCcore-10.3.0:Boost/1.76.0-GCC-10.3.0:PLUMED/2.7.2-foss-2021a:CP2K/8.2-foss-2021a:vsc-mympirun/5.2.10', '--mca', 'MODULESHOME', '/usr/share/lmod/lmod', '-np', '32', '-x', 'LD_LIBRARY_PATH', '-x', 'PATH', '-x', 'PYTHONPATH', '-x', 'OMP_NUM_THREADS', '-x', 'OMP_PROC_BIND', '--map-by', 'ppr:32:node:PE=4:SPAN:NOOVERSUBSCRIBE', '-rf', '/dodrio/scratch/users/vsc10507/.mympirun_0mafrq/1097369_20220324_095146/rankfile', 'cp2k.popt', '-i', 'AIMD.inp', '-o', 'AIMD.out']: (shellcmd ['mpirun', '-machinefile', '/dodrio/scratch/users/vsc10507/.mympirun_0mafrq/1097369_20220324_095146/nodes', '--mca', 'pml', 'ucx', '--mca', 'btl', '^uct', '--mca', 'orte_keep_fqdn_hostnames', '1', '--mca', 'MKL_NUM_THREADS', '1', '--mca', 'MODULEPATH', '/readonly/dodrio/apps/RHEL8/zen2-ib/modules/all:/etc/modulefiles/vsc', '--mca', 'LOADEDMODULES', 'cluster/dodrio/cpu_rome:GCCcore/10.3.0:zlib/1.2.11-GCCcore-10.3.0:binutils/2.36.1-GCCcore-10.3.0:GCC/10.3.0:numactl/2.0.14-GCCcore-10.3.0:XZ/5.2.5-GCCcore-10.3.0:libxml2/2.9.10-GCCcore-10.3.0:libpciaccess/0.16-GCCcore-10.3.0:hwloc/2.4.1-GCCcore-10.3.0:OpenSSL/1.1:libevent/2.1.12-GCCcore-10.3.0:UCX/1.10.0-GCCcore-10.3.0:libfabric/1.12.1-GCCcore-10.3.0:PMIx/3.2.3-GCCcore-10.3.0:OpenMPI/4.1.1-GCC-10.3.0:OpenBLAS/0.3.15-GCC-10.3.0:FlexiBLAS/3.0.4-GCC-10.3.0:gompi/2021a:FFTW/3.3.9-gompi-2021a:ScaLAPACK/2.1.0-gompi-2021a-fb:foss/2021a:Libint/2.6.0-GCC-10.3.0-lmax-6-cp2k:libxc/5.1.5-GCC-10.3.0:libxsmm/1.16.2-GCC-10.3.0:GSL/2.7-GCC-10.3.0:bzip2/1.0.8-GCCcore-10.3.0:ncurses/6.2-GCCcore-10.3.0:libreadline/8.1-GCCcore-10.3.0:Tcl/8.6.11-GCCcore-10.3.0:SQLite/3.35.4-GCCcore-10.3.0:GMP/6.2.1-GCCcore-10.3.0:libffi/3.3-GCCcore-10.3.0:Python/3.9.5-GCCcore-10.3.0:pybind11/2.6.2-GCCcore-10.3.0:SciPy-bundle/2021.05-foss-2021a:ICU/69.1-GCCcore-10.3.0:Boost/1.76.0-GCC-10.3.0:PLUMED/2.7.2-foss-2021a:CP2K/8.2-foss-2021a:vsc-mympirun/5.2.10', '--mca', 'MODULESHOME', '/usr/share/lmod/lmod', '-np', '32', '-x', 'LD_LIBRARY_PATH', '-x', 'PATH', '-x', 'PYTHONPATH', '-x', 'OMP_NUM_THREADS', '-x', 'OMP_PROC_BIND', '--map-by', 'ppr:32:node:PE=4:SPAN:NOOVERSUBSCRIBE', '-rf', '/dodrio/scratch/users/vsc10507/.mympirun_0mafrq/1097369_20220324_095146/rankfile', 'cp2k.popt', '-i', 'AIMD.inp', '-o', 'AIMD.out']) output --------------------------------------------------------------------------
Conflicting directives for mapping policy are causing the policy
to be redefined:

  New policy:   RANK_FILE
  Prior policy:  UNKNOWN

Please check that only one policy is defined.
--------------------------------------------------------------------------

2022-03-24 09:51:49,095 ERROR      root            MainThread  main: exitcode 1 > 0; cmd ['mpirun', '-machinefile', '/dodrio/scratch/users/vsc10507/.mympirun_0mafrq/1097369_20220324_095146/nodes', '--mca', 'pml', 'ucx', '--mca', 'btl', '^uct', '--mca', 'orte_keep_fqdn_hostnames', '1', '--mca', 'MKL_NUM_THREADS', '1', '--mca', 'MODULEPATH', '/readonly/dodrio/apps/RHEL8/zen2-ib/modules/all:/etc/modulefiles/vsc', '--mca', 'LOADEDMODULES', 'cluster/dodrio/cpu_rome:GCCcore/10.3.0:zlib/1.2.11-GCCcore-10.3.0:binutils/2.36.1-GCCcore-10.3.0:GCC/10.3.0:numactl/2.0.14-GCCcore-10.3.0:XZ/5.2.5-GCCcore-10.3.0:libxml2/2.9.10-GCCcore-10.3.0:libpciaccess/0.16-GCCcore-10.3.0:hwloc/2.4.1-GCCcore-10.3.0:OpenSSL/1.1:libevent/2.1.12-GCCcore-10.3.0:UCX/1.10.0-GCCcore-10.3.0:libfabric/1.12.1-GCCcore-10.3.0:PMIx/3.2.3-GCCcore-10.3.0:OpenMPI/4.1.1-GCC-10.3.0:OpenBLAS/0.3.15-GCC-10.3.0:FlexiBLAS/3.0.4-GCC-10.3.0:gompi/2021a:FFTW/3.3.9-gompi-2021a:ScaLAPACK/2.1.0-gompi-2021a-fb:foss/2021a:Libint/2.6.0-GCC-10.3.0-lmax-6-cp2k:libxc/5.1.5-GCC-10.3.0:libxsmm/1.16.2-GCC-10.3.0:GSL/2.7-GCC-10.3.0:bzip2/1.0.8-GCCcore-10.3.0:ncurses/6.2-GCCcore-10.3.0:libreadline/8.1-GCCcore-10.3.0:Tcl/8.6.11-GCCcore-10.3.0:SQLite/3.35.4-GCCcore-10.3.0:GMP/6.2.1-GCCcore-10.3.0:libffi/3.3-GCCcore-10.3.0:Python/3.9.5-GCCcore-10.3.0:pybind11/2.6.2-GCCcore-10.3.0:SciPy-bundle/2021.05-foss-2021a:ICU/69.1-GCCcore-10.3.0:Boost/1.76.0-GCC-10.3.0:PLUMED/2.7.2-foss-2021a:CP2K/8.2-foss-2021a:vsc-mympirun/5.2.10', '--mca', 'MODULESHOME', '/usr/share/lmod/lmod', '-np', '32', '-x', 'LD_LIBRARY_PATH', '-x', 'PATH', '-x', 'PYTHONPATH', '-x', 'OMP_NUM_THREADS', '-x', 'OMP_PROC_BIND', '--map-by', 'ppr:32:node:PE=4:SPAN:NOOVERSUBSCRIBE', '-rf', '/dodrio/scratch/users/vsc10507/.mympirun_0mafrq/1097369_20220324_095146/rankfile', 'cp2k.popt', '-i', 'AIMD.inp', '-o', 'AIMD.out']
hajgato commented 2 years ago

I guess the problem is that either you use a rank file or you use --map-by