Gradient normalization lowers the maximum learning rate that can converge.

lessw2020 / Ranger21

Ranger deep learning optimizer rewrite to use newest components

Apache License 2.0

323 stars 46 forks source link

Gradient normalization lowers the maximum learning rate that can converge. #41

Open Handagot opened 2 years ago

Handagot commented 2 years ago

I found this problem while training ResNet18 on cifar100 for some experiment. I still haven't looked into this issue enough to find out what the cause is.