Call set_device in thread to avoid additional cuda context

libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)

https://ffcv.io

Apache License 2.0

2.84k stars 178 forks source link

Call set_device in thread to avoid additional cuda context #190

Closed erikwijmans closed 2 years ago

erikwijmans commented 2 years ago

FFCV allocates an additional CUDA context on GPU 0 for every rank besides rank 0 when doing distributed training. This is because the call to pin_memory requires a CUDA context and this gets allocated on the current device. This PR calls torch.cuda.set_device in the loader thread to alleviate this issue.

Before

After

(Please ignore the slight differences in memory usage, I took the Before screenshot slightly before all the memory for CUDA was allocated).

GuillaumeLeclerc commented 2 years ago

Hi, This has already been fixed in the next version. For a preview you can checkout v1.0.0

Best!