Closed sheffier closed 2 years ago
Hello,
We don't have an open source snippet at the moment but it is really straightforward:
PS it seems that there is a bug with cudnn and in some situations it will crash if two BatchNorm are issued at the same time. The workaround is to put a lock shared among all your threads to ensure that no thread run the forward pass on the model at the same time (this should not slow down training since a call to forward is non-blocking)
Thanks for the quick replay!
One question though. Won’t this kind of solution suffer from the GIL?
If you avoid non ffcv augmentations it usually is fine. Ffcv runs outside of the gil. The only problem could be the model of you have a lot of fast layers that run faster than it takes to schedule them
Ok, I’ll give it a shot
Hi,
Can you give a code example of how to utilize FFCV for training multiple models on a single GPU?