multiscale-cosim / EBRAINS-cosim

EBRAINS-cosim
Other
5 stars 0 forks source link

Working through OpenMPI Socket-Style Communication on CSCS's Piz Daint #76

Closed ringleschavez closed 2 years ago

ringleschavez commented 3 years ago
Aspect Detail
Summary Troubleshooting of TVB-NEST use-case on CSCS infrastructure
Task Area Deployment Troubleshooting
Assignee
Information https://github.com/multiscale-cosim/TVB-NEST
Prerequisites
Dependencies

Summary

Making work the TVB-NEST use-case on the CSCS infraestructure environment.

Contraints:

Tasks

Requirements

Acceptance criteria

Making work client/server approach (MPI socket-style communication) on Piz Daint by using multiple compute-nodes

ringleschavez commented 3 years ago

Compiling OpenMPI 4.1.1 on CSCS's Piz Daint system

cd $SCRATCH
 tar xzf openmpi-4.1.1.tar.gz
 cd openmpi-4.1.1/
./configure --prefix=${SCRATCH}/usr/local
mkdir -p ${SCRATCH}/usr/local
make -j 32
make install
ringleschavez commented 3 years ago

Source code for testing

haskell-mpi/test/examples/clientserver/

ringleschavez commented 3 years ago

Test on CSCS's Piz Daint login node

Terminal 1 (OpenMPI orte-server)

module load daint-mc
cd ${SCRATCH}/testing_code/
export PATH=${SCRATCH}/usr/local/bin:${PATH}
export LD_LIBRARY_PATH=${SCRATCH}/usr/local/bin:${LD_LIBRARY_PATH}
ompi-server --debug --no-daemonize --report-uri RTE.data.server

Terminal 2 (Server)

module load daint-mc
cd ${SCRATCH}/testing_code/
export PATH=${SCRATCH}/usr/local/bin:${PATH}
export LD_LIBRARY_PATH=${SCRATCH}/usr/local/bin:${LD_LIBRARY_PATH}
mpiexec --ompi-server file:RTE.data.server -np 1 ./server

Terminal 3 (Client)

module load daint-mc
cd ${SCRATCH}/testing_code/
export PATH=${SCRATCH}/usr/local/bin:${PATH}
export LD_LIBRARY_PATH=${SCRATCH}/usr/local/bin:${LD_LIBRARY_PATH}
mpiexec --ompi-server file:RTE.data.server -np 1 ./client <ThePortCreatedByTheServer>
ringleschavez commented 2 years ago

It is possible to use the Socket-Style Communication on single-node, since the server and client binaries are launch by means of slurm + mpirun, notwithstanding, on multiple-nodes, when mpirun is used, multiple instances of the mentioned binaries are launched which is not the expected operation on run-time.

After testing several alternatives of compiling OpenMPI 4.1.1 on Piz Daint, the conclusion is SLURM tools running on Piz Daint should provide the proper PMI or PMIx interface in order to use the Socket-Style Communication approach implemented on OpenMPI 4.1.1.

ringleschavez commented 2 years ago

OpenMPI on CSCS Piz Daint.pdf is the first draft showing the outcomes of the tested compilation process.