sail-sg / Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Apache License 2.0
756 stars 64 forks source link

\epsilon not implemented as in the paper #8

Closed Zach-ER closed 2 years ago

Zach-ER commented 2 years ago

Hi there, $\epsilon$ is within the square root in the paper (L6 in Algorithm 1), but in the code, it is outside of the square root. Could you expand on the reason for this?

XingyuXie commented 2 years ago

Hi, @Zach-ER

In terms of implementation, the $\epsilon$ location is consistent with the previous adaptive type algorithm, such as Adam, AdamW, LAMB, etc. Although some optimizers (e.g., Adablief) find that putting $\epsilon$ inside the root may improve the performance slightly. We did not do this and did not test the difference between the two $\epsilon$ location cases.

In the paper, we put $\epsilon$ inside the root for the convenience of proof so that we can write one less root sign for all our theoretical bounds.

Thank you for your question. For practical usage, please refer to the released adan.py.