juntang-zhuang / Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
BSD 2-Clause "Simplified" License
1.05k stars 108 forks source link

Changing init learning rate #53

Closed Kraut-Inferences closed 3 years ago

Kraut-Inferences commented 3 years ago

Does modifying the initial learning rate hurt the algorithm in any way? Wanting to use exponential decay but don't know if it would improve the performance.

juntang-zhuang commented 3 years ago

From my experience with a ViT model on ImageNet, AdaBelief improves over Adam when both use a default cosine learning rate. I think it should work with other models.

Kraut-Inferences commented 3 years ago

thank you.