Just a reminder, we should do something about nodes with multiple GPUs.
This was for instance asked by Jin for CMS, as he found that one of our condor nodes has 4 GPUs (b9g57n8656.cern.ch).
Presently we simply do cudaSetDevice(0) and hipSetDevice(0) (actually both as gpuSetDevice(0). In any case for the moment the code is meant to only use one GPU at a time, so one option is to keep the code the same, but then have the CUDA/etc specific env variables to select one single GPU as the visible GPU (I believe this is CUDA_VISIBLE_DEVICES, while instead NVIDIA_VISIBLE_DEVICES is for visibility inside docker containers?)
Just a reminder, we should do something about nodes with multiple GPUs.
This was for instance asked by Jin for CMS, as he found that one of our condor nodes has 4 GPUs (b9g57n8656.cern.ch).
Presently we simply do
cudaSetDevice(0)
andhipSetDevice(0)
(actually both asgpuSetDevice(0)
. In any case for the moment the code is meant to only use one GPU at a time, so one option is to keep the code the same, but then have the CUDA/etc specific env variables to select one single GPU as the visible GPU (I believe this is CUDA_VISIBLE_DEVICES, while instead NVIDIA_VISIBLE_DEVICES is for visibility inside docker containers?)Some interesting reads on multi-GPU