Closed Eliyas0007 closed 4 months ago
same question
I am also confused about this. The end result is that gradients are scaled according to the magnitude of the loss. Maybe this is some learning rate scheduling wizardry?
Hmm, looks like this was committed by mistake. Changed in this commit. Thanks!
same question