Multi-GPU training is reducing speed compared to single GPU

For training with both the baseline and soft-teacher configs, I am always getting much slower training with more gpus. For training with 1% label, the single gpu training shows 2 days of approximated training while 8 gpus shows 5 days of approximated training. I don't understand the underlying reason. I am using 8 A5000 GPU node. Can anyone tell how long should it take? What can I do to get the speedup from multi-gpu training? I am badly stuck on this. Any help will be greatly appreciated.

microsoft / SoftTeacher

Multi-GPU training is reducing speed compared to single GPU #217