Open kindloaf opened 5 years ago
It seems to me the rescale_grad should be set to 1/batch_size. In case of 4 GPUs training rescale_grad should be 0.25. Why is the value set to "1.0" in train_end2end.py?
It seems to me the rescale_grad should be set to 1/batch_size. In case of 4 GPUs training rescale_grad should be 0.25. Why is the value set to "1.0" in train_end2end.py?