microsoft / SoftTeacher

Semi-Supervised Learning, Object Detection, ICCV2021
MIT License
900 stars 123 forks source link

Multi-GPU training is reducing speed compared to single GPU #217

Open tanvir-utexas opened 2 years ago

tanvir-utexas commented 2 years ago

For training with both the baseline and soft-teacher configs, I am always getting much slower training with more gpus. For training with 1% label, the single gpu training shows 2 days of approximated training while 8 gpus shows 5 days of approximated training. I don't understand the underlying reason. I am using 8 A5000 GPU node. Can anyone tell how long should it take? What can I do to get the speedup from multi-gpu training? I am badly stuck on this. Any help will be greatly appreciated.