Closed evakasch closed 4 years ago
I find out if I use multi-GPU, I should decrease the learning rate to avoid huge loss.
Yes. Emperically, lr should be divided by n if you use n GPUs.
lr
I find out if I use multi-GPU, I should decrease the learning rate to avoid huge loss.