Hi authors, I found that in your loss function here, you sum up all the number of ground truth bboxes among all the gpus, without averaging this number by the number of gpus. It seems that it would lower down the loss value if we use more gpus to train right? Is it a right way to do this?
Hi authors, I found that in your loss function here, you sum up all the number of ground truth bboxes among all the gpus, without averaging this number by the number of gpus. It seems that it would lower down the loss value if we use more gpus to train right? Is it a right way to do this?