lessw2020 / Ranger-Deep-Learning-Optimizer

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
Apache License 2.0
1.19k stars 176 forks source link

Grad norm and ranger #14

Closed hadaev8 closed 4 years ago

hadaev8 commented 5 years ago

Im using nvidia apex and torch grad norm. This is grad norm plot with ranger (red) and adamw (blue). https://i.imgur.com/Ui4Sioo.png Is this ok to have so huge grad norm values? Should I turn off grad norming?