juntang-zhuang / Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
BSD 2-Clause "Simplified" License
1.05k stars 109 forks source link

Implementation of pure keras #46

Open liaoxuanzhi opened 3 years ago

liaoxuanzhi commented 3 years ago

Do you have pure keras implementation version? Thanks

juntang-zhuang commented 3 years ago

@liaoxuanzhi We don't have pure implementations in Keras yet. Do you know any pure Keras implementation of Adam or RAdam? It would be easier to start from some existing implementations.

liaoxuanzhi commented 3 years ago

sorry for the late reply. i will implement this algorithm by pure keras and feedback to you as soon as possible.

liaoxuanzhi commented 3 years ago

i am training the pure keras version of adabelief algorithm using the small resnet18(decrease the kernel size by 2) on cifar10. according to my monitoring, from 0-36 epoch, it hits 81.9%,which is better than adam(0-36 epoch, best is 80.58%).model is still trainning, and i will update to you the final result soon( see the lr decrease*0.1 at 150 epoch).

juntang-zhuang commented 3 years ago

Cool, thanks a lot! Please create a new pull request when you finish the code.

liaoxuanzhi commented 3 years ago

sorry for late report. i have done the implementation of pure keras version(keras =2.2.5,tensorflow=1.14.0). please check this link: this is the initial idea of your work(without any tricks like Rectified,decouple weight_decay etc..) for using it(please choose the parameter recommended ): from AdaBelief import AdaBelief opt = AdaBelief(lr=0.001, beta_1=0.9, beta_2=0.999,epsilon=1e-08, decay=0., weight_decay=0.0)

ps: pytorch uses weight_decay in optimizers while the keras uses it L2 in every trainable layers.for my implementation, i construct a small nerual network like LENET5, and train on cifar10. adabelief is always better and stable than adam. the average result in five runs is 82.7%,while the adam is like 82%.

juntang-zhuang commented 3 years ago

Cool, thanks so much! We could create a new pull request and merge your code, and perhaps push to pip once we figure out how to add features such as rectify.