Support for AdamW gradient method

jolibrain / deepdetect

Deep Learning API and Server in C++14 support for Caffe, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Other

2.52k stars 560 forks source link

Hi, good idea. We're aware of AdamW, the main modification is tiny, though the updated version that is 'compatible' with SGDR (annealing) is more complicated, see https://github.com/pytorch/pytorch/pull/4429#discussion_r248627341

In the meantime, it is recommended you use AMSGRAD instead of ADAM everywhere, though don't expect better results overall, just a fix to some settings, see https://fdlm.github.io/post/amsgrad/

As a remainder, SGDR is also implemented, look at #377. As SGDR automatically schedules the learning rate, you may not need ADAM actually, though training may take longer on average due to the annealing cycles.

jolibrain / deepdetect

Support for AdamW gradient method #541