nest / nest-gpu

NEST GPU
https://nest-gpu.readthedocs.io
GNU General Public License v2.0
18 stars 12 forks source link

Running on multiple GPUs #74

Open V-Marco opened 1 year ago

V-Marco commented 1 year ago

Hello,

Can NEST GPU automatically utilize multiple GPUs when running with SLURM? I have a two Tesla T4 setup on a single node configured with gres, and I found that for a single simulation NEST GPU only uses one of them even when the load goes up to 100%. I was wondering if my SLURM configuration is wrong, or if NEST GPU is designed to run a single simulation on a single GPU only.

JoseJVS commented 1 year ago

Hi,

It would be very helpful if you could send us the python script of the simulation as well as the batch script (or SLURM arguments) you are using to schedule the job.

However, here are some initial insights, to use multiple GPUs you need to use multiple MPI processes, currently we use the CUDA_VISIBLE_DEVICES environment variable to assign different GPUs for each MPI process, if you are using OpenMPI you can achieve this with export CUDA_VISIBLE_DEVICES=$OMPI_COMM_WORLD_LOCAL_RANK (other MPI implementations may use other variable to store the local rank).

Another possible issue might be that you are not using the RemoteCreate and RemoteConnect functions to instantiate your network, these are necessary in a multi-process environment to correctly allocate nodes and connections to different processes. You can find an example of such instantiation in the HPC Benchmark model.