mila-iqia / mila-docs

Mila technical documentation
https://docs.mila.quebec
8 stars 23 forks source link

Bugfix GPU ordinal computation #245

Closed obilaniu closed 3 months ago

obilaniu commented 3 months ago

With --gpus-per-task=rtx8000:1, all tasks only see 1 GPU. Therefore, the only valid GPU ordinal is 0, even if each task sees a different single GPU.

For extra robustness, use the common trick of calculating the ordinal as ordinal = rank % device_count(), which works both in and outside of SLURM.