Closed lezhang7 closed 2 months ago
@lezhang7 you see less samples per second per gpu as you increase the # of gpus, but the total samples / sec should increase until you saturate your interconnect. You could have a broken distributed setup if more than two causes a significant slowdown. Or slow disks causing IO bottlenecks reading your dataset...
Hi,
I found that training speed was slow down if number of gpus is more than 2, is it because more gpus brings larger batch size to compute and gather_all will take up some time?
Best