Closed ghost closed 2 years ago
Thanks for reporting this. This is my bad. I will fix it soon.
This should be fixed on master now. Please let me know if the problem persists.
The problem appears to be fixed, thank you. Performance doesn't seem to scale well with the number of GPUs though. 4x GPUs is only about 1.5x as fast as 1 GPU even though all are being utilized. Could just be a CPU bottleneck on my side though. As even with a high number of workers only a few cores appear to be doing the majority of the work.
That's somewhat expected; only the single-process parallelism (using nn.DataParallel
) is supported, and that's a bit slow (but easy to implement). We can use multi-process distributed training to further speed up multi-GPU training. Using distributed training doesn't mean we can see 4x speedup with 4x GPUs though. That's faster at least.
I'll work on distributed training in the future but it's not a priority in my opinion. Using single GPU is enough for most cases.
When setting data_parallel to "true" in the config, I end up with this as a result. I run this under a Jupyter notebook instance that is based on the ENUNU-Training-Kit with the dev2 branch of NNSVS.