Perlmutter: run optimas and Wake-T with MPI

n01r commented 6 months ago

Hi,

I have been trying to run optimas and Wake-T on Perlmutter (NERSC) with MPI but I could not make it work so far. Does anyone have a working setup for Perlmutter? I wonder if specific modules needed to be loaded and if certain environment variables would have to be set before installing optimas and on execution. The way specified in the docs is sufficient for single-node runs (https://optimas.readthedocs.io/en/latest/user_guide/installation_perlmutter.html) but once I add libe_comms='mpi' to the Exploration object and try to prepend, e.g., srun -N 1 -n 8 on an interactive node, I am getting errors about missing MPI shared object files.

  File "/global/cfs/cdirs/m3239/mgarten/sw/perlmutter/conda_envs/wake-t_env/lib/python3.10/site-packages/optimas/explorations/base.py", line 506, in _set_default_libe_specs
    from mpi4py import MPI
ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

When I load mpich which unloads a pre-loaded cray-mpich module it asks for GCC 12 shared object files. After loading a GCC it asks for the CUDA runtime.

  File "/global/cfs/cdirs/m3239/mgarten/sw/perlmutter/conda_envs/wake-t_env/lib/python3.10/site-packages/optimas/explorations/base.py", line 506, in _set_default_libe_specs
    from mpi4py import MPI
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

shuds13 commented 6 months ago

@n01r

I tried running the main_simulation_script.py you gave me a few months ago with latest Optimas on main. I had to make a couple changes in addtion to setting libe_comms='mpi' in Exploration options.

In explorations/base.py I put error handling round makedirs (prevent race condition).

            try:
                os.makedirs(main_dir)
            except Exception as e:
                pass

I also had to update to pydantic 2.

pip install -U pydantic

Then on 2 nodes, this is running:

srun -N 2 -n 8 python main_simulation_script.py

This got succesfully ran exp.run(), then errored in h = exp.history

as again trying to call from all workers what should only be done on manager. So really if using MPI comms in explorations/base.py we should extract an is_manager value for these operations, but you could also for now put try/except round h = exp.history.

edit: I should have run srun -N 2 -n 9 .... so gives one process to the manager!

I got python from "module load conda" This is my modules:

shuds@nid004089:try_waket_mpi$ module list

Currently Loaded Modules: 1) craype-x86-milan 7) cray-libsci/23.12.5 13) cudatoolkit/12.2 2) libfabric/1.15.2.0 8) cray-mpich/8.1.28 14) craype-accel-nvidia80 3) craype-network-ofi 9) craype/2.7.30 15) gpu/1.0 4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta 10) gcc-native/12.3 16) conda/Miniconda3-py311_23.11.0-2 5) PrgEnv-gnu/8.5.0 11) perftools-base/23.12.0 6) cray-dsmml/0.2.2 12) cpe/23.12

I got nodes as follows: shuds@login14:try_waket_mpi$ salloc -N 2 -t 30 -C cpu -q interactive -A m4272

n01r commented 5 months ago

Hi @shuds13, I finally got back to trying this out.

Unfortunately, I can only make it run interactively. So far, I have not yet achieved the goal of having this run in an unsupervised job.

I also needed to add the following lines:

# somehow necessary to get MPI to work
#   will otherwise complain due to missing libmpi.so.12
export LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib-abi-mpich:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.28/ofi/nvidia/23.3/lib-abi-mpich:$LD_LIBRARY_PATH

But in that case I am still getting


MPICH ERROR [Rank 0] [job id 26693115.0] [Tue Jun 11 14:18:18 2024] [nid008212] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked
 (Other MPI error)

aborting job:
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked

srun: error: nid008681: tasks 97-117,119-128: Segmentation fault
srun: Terminating StepId=26693115.0
slurmstepd: error: *** STEP 26693115.0 ON nid008212 CANCELLED AT 2024-06-11T21:18:18 ***
srun: error: nid008212: tasks 1-13,15-21,23-27,29-32: Segmentation fault
...

Edit: I am not sure anymore if I managed to run with export MPICH_GPU_SUPPORT_ENABLED=1 interactively, yesterday. Today I only got a simple test program to work with export MPICH_GPU_SUPPORT_ENABLED=0.

GPU support is not super important as long as I am doing Wake-T but both the Ax generator as well as other codes that I might run will have to run with GPU support.

n01r commented 5 months ago

A simple test program reproduces the error.

test_mpi.py

``` from mpi4py import MPI comm = MPI.COMM_WORLD print(f"Hello from rank {comm.Get_rank()} of {comm.Get_size()}") ```

This was submitted via

debug_MPI_test.sbatch

``` #!/bin/bash -l #SBATCH -t 00:30:00 # for debug #S BATCH -t 12:00:00 #SBATCH -N 1 #SBATCH -J test_MPI # note: must end on _g #SBATCH -A m4272_g #SBATCH -q debug #S BATCH -q regular # A100 40GB (most nodes) #SBATCH -C gpu # A100 80GB (256 nodes) #S BATCH -C gpu&hbm80g #SBATCH --exclusive # ideally single:1, but NERSC cgroups issue #SBATCH --gpu-bind=none #SBATCH --gpus-per-node=4 #SBATCH -o test_mpi.o%j #SBATCH -e test_mpi.e%j #source ~/wake-t.profile #module list # somehow necessary to get MPI to work # will otherwise complain due to missing libmpi.so.12 export LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib-abi-mpich:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.28/ofi/nvidia/23.3/lib-abi-mpich:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.28/gtl/lib:$LD_LIBRARY_PATH export MPICH_GPU_SUPPORT_ENABLED=1 source ~/wake-t.profile printenv > env_variables.txt module list > loaded_modules.txt # need to add 1 rank for the manager # so here we have 128 workers srun -N 1 -n 4 python3 test_mpi.py > test_output_${SLURM_JOBID}.txt ```

And the errors received preceded by the list of loaded modules are here

Currently Loaded Modules:
  1) craype-x86-milan
  2) libfabric/1.15.2.0
  3) craype-network-ofi
  4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta
  5) PrgEnv-gnu/8.5.0
  6) cray-dsmml/0.2.2
  7) cray-libsci/23.12.5
  8) cray-mpich/8.1.28
  9) craype/2.7.30
 10) gcc-native/12.3
 11) perftools-base/23.12.0
 12) cpe/23.12
 13) cudatoolkit/12.2
 14) craype-accel-nvidia80
 15) gpu/1.0
 16) conda/Miniconda3-py311_23.11.0-2

Traceback (most recent call last):
  File "/pscratch/sd/m/mgarten/electron_multistaging/wake-t/075_like_70_500k_particles/075_070_A_like_48_scan_Carlos_ramp_w_res_1x_8ppc_1_GeV/test_mpi.py", line 1, in <module>
Traceback (most recent call last):
  File "/pscratch/sd/m/mgarten/electron_multistaging/wake-t/075_like_70_500k_particles/075_070_A_like_48_scan_Carlos_ramp_w_res_1x_8ppc_1_GeV/test_mpi.py", line 1, in <module>
Traceback (most recent call last):
  File "/pscratch/sd/m/mgarten/electron_multistaging/wake-t/075_like_70_500k_particles/075_070_A_like_48_scan_Carlos_ramp_w_res_1x_8ppc_1_GeV/test_mpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libnvf.so: cannot open shared object file: No such file or directory
    from mpi4py import MPI
ImportError: libnvf.so: cannot open shared object file: No such file or directory
    from mpi4py import MPI
ImportError: libnvf.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/pscratch/sd/m/mgarten/electron_multistaging/wake-t/075_like_70_500k_particles/075_070_A_like_48_scan_Carlos_ramp_w_res_1x_8ppc_1_GeV/test_mpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libnvf.so: cannot open shared object file: No such file or directory
srun: error: nid001200: tasks 0-3: Exited with exit code 1
srun: Terminating StepId=26695216.0

n01r commented 5 months ago

Okay, I stole some installation instructions from our ImpactX installation on Perlmutter.

This did it for the small test script:

I loaded cray-python since our ImpactX profile file says

# optional: for Python bindings or libEnsemble
module load cray-python/3.11.5

then I reinstalled mpi4py

python3 -m pip uninstall -qqq -y mpi4py 2>/dev/null || true
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade build
python3 -m pip install --upgrade packaging
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade setuptools
MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py

and I added

# necessary to use CUDA-Aware MPI and run a job
export CRAY_ACCEL_TARGET=nvidia80

Now let's see if that will work in an unsupervised way and if it works then also for optimal.

EDIT: Nope :disappointed: I tried it fresh, logged out and back in, got a node, same error messages.

n01r commented 5 months ago

Okay, I think I fixed it now. I redid my whole optimas+Wake-T installation

module load cray-python/3.11.5
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv
python3 -m pip cache purge
python3 -m venv /global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/
source /global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/bin/activate
python3 -m pip uninstall -qqq -y mpi4py 2>/dev/null || true
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade build
python3 -m pip install --upgrade packaging
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade setuptools
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py
python3 -m pip install --upgrade openpmd-api
python3 -m pip install --upgrade matplotlib
python3 -m pip install "optimas[all] @ git+https://github.com/optimas-org/optimas.git"
python3 -m pip install --upgrade wake-t

EDIT Indeed a job is running unsupervised with optimas and Wake-T. Cool :+1:

shuds13 commented 5 months ago

Good to hear its running. It works for me using module load python instead of module load conda. I think when I tried before I activated an environment that switched to the correct python.

n01r commented 5 months ago

Right, I played around with that in my previous environment, too. But I could not get it to work reliably.

optimas-org / optimas

Perlmutter: run optimas and Wake-T with MPI #203