Closed wuzuowuyou closed 4 years ago
The line is not to average the multiple logits but is to make the accumulated gradients invariant to the number of iteration ITER_SIZE
. The block accumulates 1/N-scaled gradients by N times and then update the parameters with N/N-magnitude gradients. It is equivalent to compute the raw gradients once and update parameters immediately. The common trick to save memory.
First thank you for your meticulous work!
why iter_loss /= CONFIG.SOLVER.ITER_SIZE
instead of iter_loss /= logits.size()