jettify / pytorch-optimizer

torch-optimizer -- collection of optimizers for Pytorch
Apache License 2.0
3.04k stars 299 forks source link

Wrong paper references for Ranger optimizer variants #244

Closed jwuphysics closed 3 years ago

jwuphysics commented 3 years ago

The README lists Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM by Tong, Liang, and Bi (2019) as the source paper accompanying the Ranger, RangerQH, and RangerVA codes. However, this paper seems to describe only the addition of softplus to Adam (SAdam) and AMSGrad (SAMSGrad), implemented here: https://github.com/neilliang90/Sadam; it makes no mention of the LookAhead or RAdam techniques. Therefore it makes sense to credit RangerVA to this paper, but Ranger and RangerQH should not use this reference.

The original Ranger optimizer is a combination of the LookAhead, Rectified Adam, and Gradient Centralization papers, and is described in a blog post.

RangerQH uses quasi-hyperbolic momentum introduced by Ma and Yarats (2018) on top of the regular Ranger optimizer, so I believe this should be the reference.

I would propose the following references:

jettify commented 3 years ago

Would you like to submit PR with proposed changes?