I recently put out a proposal to add an argument to adam-derived optimizers to skip the bias-correction term on w, only applying it to v. See the figure attached in the issue https://github.com/pytorch/pytorch/issues/67105 and the write-up I put together for theoretical justification AdamD: Improved bias-correction in Adam. Since it's still too early in the idea's existence to add this to the pytorch repo (according to them), your repo seems like a reasonable home for it. I am happy to send you a PR, but I would like to hear what you would prefer:
A new optimizer, AdamD and AdamDW (mirroring Adam/AdamW but with the bias-correction on the w term step excluded).
An otherwise vanilla fork of Adam/AdamW, with a boolean flag allowing the user to turn the bias-correction on/off, as well as adding this option to the relevant optimizers already included in this repo. I have not read through it carefully but this would likely include Lamb (it would be an option to enable bias-correction on v only, since it is already excluded otherwise), AdamP, and maybe others.
Let me know how you would like to proceed, or if you want any further clarification!
I recently put out a proposal to add an argument to adam-derived optimizers to skip the bias-correction term on
w
, only applying it tov
. See the figure attached in the issue https://github.com/pytorch/pytorch/issues/67105 and the write-up I put together for theoretical justification AdamD: Improved bias-correction in Adam. Since it's still too early in the idea's existence to add this to the pytorch repo (according to them), your repo seems like a reasonable home for it. I am happy to send you a PR, but I would like to hear what you would prefer:w
term step excluded).v
only, since it is already excluded otherwise), AdamP, and maybe others.Let me know how you would like to proceed, or if you want any further clarification!