lessw2020 / Ranger21

Ranger deep learning optimizer rewrite to use newest components
Apache License 2.0
321 stars 45 forks source link

Gradient normalization lowers the maximum learning rate that can converge. #41

Open Handagot opened 2 years ago

Handagot commented 2 years ago

I found this problem while training ResNet18 on cifar100 for some experiment. I still haven't looked into this issue enough to find out what the cause is.