Closed obilaniu closed 3 months ago
With --gpus-per-task=rtx8000:1, all tasks only see 1 GPU. Therefore, the only valid GPU ordinal is 0, even if each task sees a different single GPU.
--gpus-per-task=rtx8000:1
1
0
For extra robustness, use the common trick of calculating the ordinal as ordinal = rank % device_count(), which works both in and outside of SLURM.
ordinal = rank % device_count()
With
--gpus-per-task=rtx8000:1
, all tasks only see1
GPU. Therefore, the only valid GPU ordinal is0
, even if each task sees a different single GPU.For extra robustness, use the common trick of calculating the ordinal as
ordinal = rank % device_count()
, which works both in and outside of SLURM.