huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.24k stars 26.34k forks source link

Adam optimiser not following Pytorch conventions #510

Closed tonianelope closed 5 years ago

tonianelope commented 5 years ago

Both BertAdam and OpenAIAdam don't follow the pytroch convetion to define the betas parameter for Adam Optimisers as a tuple, but instead has parameters b1 and b2.

Pytorch based libraries like fastai expect the optimizer betas to be a tuple. Any reason b1/2 is used instead of a tuple? Would be great to change so the optimisers can integrate with other pytorch libraries.

thomwolf commented 5 years ago

We could update that indeed, that's just a relic of the Tensorflow conversion. Do you want to submit a PR? Otherwise I'll do it when I work on the next release.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.