Open wuzhi19931128 opened 4 years ago
Same question - given that the models across GPUs are synced after optimizer.step(), then every validate run is effectively the same. As an optimization, if we run validate in a distributed manner (like training), then how do we average the accuracies across gpus and nodes to decide saving checkpoints?
It seems like validate run in every GPU when distributed. What should be changed to save time by run validate distributed?