Closed Zach-ER closed 2 years ago
Hi, @Zach-ER
In terms of implementation, the $\epsilon$ location is consistent with the previous adaptive type algorithm, such as Adam, AdamW, LAMB, etc. Although some optimizers (e.g., Adablief) find that putting $\epsilon$ inside the root may improve the performance slightly. We did not do this and did not test the difference between the two $\epsilon$ location cases.
In the paper, we put $\epsilon$ inside the root for the convenience of proof so that we can write one less root sign for all our theoretical bounds.
Thank you for your question. For practical usage, please refer to the released adan.py
.
Hi there, $\epsilon$ is within the square root in the paper (
L6
in Algorithm 1), but in the code, it is outside of the square root. Could you expand on the reason for this?