When I set train_batch_size to 8 and experiment with 8 GPUs, the batch_size will be 64 overall.
Then I think the learning speed should be faster than when I did batch_size 8 with a single GPU, but when I actually learned it, it takes a similar amount of time.
Is something going wrong?
When I set train_batch_size to 8 and experiment with 8 GPUs, the batch_size will be 64 overall. Then I think the learning speed should be faster than when I did batch_size 8 with a single GPU, but when I actually learned it, it takes a similar amount of time. Is something going wrong?