libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.81k stars 180 forks source link

Training multiple models on a single GPU #183

Closed sheffier closed 2 years ago

sheffier commented 2 years ago

Hi,

Can you give a code example of how to utilize FFCV for training multiple models on a single GPU?

GuillaumeLeclerc commented 2 years ago

Hello,

We don't have an open source snippet at the moment but it is really straightforward:

PS it seems that there is a bug with cudnn and in some situations it will crash if two BatchNorm are issued at the same time. The workaround is to put a lock shared among all your threads to ensure that no thread run the forward pass on the model at the same time (this should not slow down training since a call to forward is non-blocking)

sheffier commented 2 years ago

Thanks for the quick replay!

One question though. Won’t this kind of solution suffer from the GIL?

GuillaumeLeclerc commented 2 years ago

If you avoid non ffcv augmentations it usually is fine. Ffcv runs outside of the gil. The only problem could be the model of you have a lot of fast layers that run faster than it takes to schedule them

sheffier commented 2 years ago

Ok, I’ll give it a shot