Open V-Marco opened 1 year ago
Hi,
It would be very helpful if you could send us the python script of the simulation as well as the batch script (or SLURM arguments) you are using to schedule the job.
However, here are some initial insights, to use multiple GPUs you need to use multiple MPI processes, currently we use the CUDA_VISIBLE_DEVICES
environment variable to assign different GPUs for each MPI process, if you are using OpenMPI you can achieve this with export CUDA_VISIBLE_DEVICES=$OMPI_COMM_WORLD_LOCAL_RANK
(other MPI implementations may use other variable to store the local rank).
Another possible issue might be that you are not using the RemoteCreate
and RemoteConnect
functions to instantiate your network, these are necessary in a multi-process environment to correctly allocate nodes and connections to different processes. You can find an example of such instantiation in the HPC Benchmark model.
Hello,
Can NEST GPU automatically utilize multiple GPUs when running with SLURM? I have a two Tesla T4 setup on a single node configured with gres, and I found that for a single simulation NEST GPU only uses one of them even when the load goes up to 100%. I was wondering if my SLURM configuration is wrong, or if NEST GPU is designed to run a single simulation on a single GPU only.