juntang-zhuang / Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
BSD 2-Clause "Simplified" License
1.05k stars 109 forks source link

Compatibility with warmup #55

Closed joihn closed 3 years ago

joihn commented 3 years ago

I use a LR scheduler to configure a warmup (Lr linearly increasing from very small value to it's real (=from args) value for 500 iter). Will this confuse adabelief or it's okay ?

juntang-zhuang commented 3 years ago

AdaBelief has a support for rectify=True, same warmup as RAdam. If you have your own lr schedule, you can set rectify=False and apply your own schedule

joihn commented 3 years ago

Okay thanks