multiscale-cosim / EBRAINS-cosim

EBRAINS-cosim
Other
5 stars 0 forks source link

2.3 Working through OpenMPI Socket-Style Communication on JSC's JUSUF #82

Closed ringleschavez closed 2 years ago

ringleschavez commented 3 years ago
Aspect Detail
Summary Troubleshooting of TVB-NEST use-case on CSCS infrastructure
Task Area Deployment Troubleshooting
Assignee
Information https://github.com/multiscale-cosim/TVB-NEST
Prerequisites
Dependencies

Summary

Making work the TVB-NEST use-case on the CSCS infraestructure environment.

Contraints:

Tasks

Requirements

Acceptance criteria

Making work client/server approach (MPI socket-style communication) on Piz Daint by using multiple compute-nodes

ringleschavez commented 2 years ago

Having checked the MPI frameworks available on JUSUF, there were found the followings:

OpenMPI

@jsfl01 ~]$ module spider OpenMPI

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  OpenMPI:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Description:
      The Open MPI Project is an open source MPI-3 implementation.

     Versions:
        OpenMPI/4.1.0rc1
        OpenMPI/4.1.1

ParaStationMPI

@jsfl01 ~]$ module spider ParaStationMPI

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  ParaStationMPI:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Description:
      ParaStation MPI is an open source high-performance MPI 3.0 implementation, based on MPICH v3. It provides extra low level communication libraries and integration with various batch systems for tighter
      process control. 

     Versions:
        ParaStationMPI/5.4.7-1-mt
        ParaStationMPI/5.4.7-1
        ParaStationMPI/5.4.8-1
        ParaStationMPI/5.4.9-1-mt
        ParaStationMPI/5.4.9-1
        ParaStationMPI/5.4.10-1-mt
        ParaStationMPI/5.4.10-1
ringleschavez commented 2 years ago

As it could be seen on the previous comment, OpenMPI 4.1.1 is already installed on JUSUF, meaning that only remains the Socket-Style Communication approach by using such framework.

Additionally, just to confirm which version support the mentioned communication approach, the client/server source code will be compiled using the other MPI frameworks.

ringleschavez commented 2 years ago

In order to use OpenMPI 4.1.1 on JUSUF, other modules must be loaded before hand, i.e.:


----------------------------------------------------------------------------------------------------
  OpenMPI: OpenMPI/4.1.1
-----------------------------------------------------------------------------------------------------
    Description:
      The Open MPI Project is an open source MPI-3 implementation.

    Properties:
      built for GPU

    You will need to load all module(s) on any one of the lines below before the "OpenMPI/4.1.1" module is available to load.

      Stages/2020  GCC/10.3.0
      Stages/2020  Intel/2021.2.0-GCC-10.3.0
      Stages/2020  NVHPC/21.5-GCC-10.3.0
      Stages/2020  NVHPC/21.9-GCC-10.3.0
      Stages/2022  GCC/11.2.0
ringleschavez commented 2 years ago

Important information about the OpenMPI rendezvous server ompi-server: issue # 6916 by rhc54

ringleschavez commented 2 years ago

Another issue: 7094 v.4.0.2 MPI_Comm_connect/MPI_Comm_accept fails with “is on host: unknown!”

ringleschavez commented 2 years ago

There was a confirmation from Sandra that multi-node does not work properly, more research and support might be needed.