juntang-zhuang / Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
BSD 2-Clause "Simplified" License
1.05k stars 108 forks source link

Upgrade with Adas optimizer #45

Closed DaniyarM closed 3 years ago

DaniyarM commented 3 years ago

What do you think about merge Adabelief with Adas (https://github.com/YanaiEliyahu/AdasOptimizer)? Or do they conflict?

juntang-zhuang commented 3 years ago

what's the algorithm of Adas? Seems no document is provided.

DaniyarM commented 3 years ago

It seems that it have description only in the repo inside "Theory" and "How ADAS works" sections...

juntang-zhuang commented 3 years ago

I’m not quite sure by just looking at these sections. It seems the general idea is to perform gradient descent on per-element lr. Seems to be interesting. But I’m quite concerned about the fast convergence is due to lr is rapidly decayed, rather than it truly learns well. Another concern is with computation, because ADAS needs to take extra gradient w.r.t learning rate, not sure how much burden will it cost. Perhaps need some more validation.