Closed nilin closed 10 months ago
Actually I have a question about this. Will this result in the "prev_grad" in the next iteration being the grad BEFORE the scaling or AFTER the scaling? I guess it depends on whether it's saved in .update() or.apply_updates().
Can you check on this to make sure it's doing the right thing before merging?
that way constrain_norm doesn't need to know the learning rate