jettify / pytorch-optimizer

torch-optimizer -- collection of optimizers for Pytorch
Apache License 2.0
3.04k stars 299 forks source link

AdamD implementation (or option to skip bias-correction to adam-derived optimizers)? #385

Open jstjohn opened 3 years ago

jstjohn commented 3 years ago

I recently put out a proposal to add an argument to adam-derived optimizers to skip the bias-correction term on w, only applying it to v. See the figure attached in the issue https://github.com/pytorch/pytorch/issues/67105 and the write-up I put together for theoretical justification AdamD: Improved bias-correction in Adam. Since it's still too early in the idea's existence to add this to the pytorch repo (according to them), your repo seems like a reasonable home for it. I am happy to send you a PR, but I would like to hear what you would prefer:

  1. A new optimizer, AdamD and AdamDW (mirroring Adam/AdamW but with the bias-correction on the w term step excluded).
  2. An otherwise vanilla fork of Adam/AdamW, with a boolean flag allowing the user to turn the bias-correction on/off, as well as adding this option to the relevant optimizers already included in this repo. I have not read through it carefully but this would likely include Lamb (it would be an option to enable bias-correction on v only, since it is already excluded otherwise), AdamP, and maybe others.

Let me know how you would like to proceed, or if you want any further clarification!

jettify commented 3 years ago

I will be happy to accept PR, I like option 1 seems like more clear API. Internally if possible implementation should share code if possible.