keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.47k forks source link

Add AdEMAMix Optimizer #20258

Closed IMvision12 closed 1 month ago

IMvision12 commented 1 month ago

AdEMAMix integrates Adam and EMA optimization methods to tackle issues of slow convergence and subpar generalization in large language models and noisy datasets. It utilizes three beta parameters along with an alpha parameter to provide flexible momentum and adaptive learning rates.

Paper: https://arxiv.org/abs/2409.03137

I'm interested in adding this optimizer to Keras.

@fchollet

fchollet commented 1 month ago

Thanks for the suggestion. I see the paper has a total of 0 citations listed on ArXiv. As a general rule we wait to see >50 citations before including a technique in Keras. As per API guidelines: "We only add new objects that are already commonly used in the machine learning community"

fchollet commented 1 month ago

Now, if you want to build this optimizer, you can do so in your own repo, and then we can share it with the community to see if people adopt it. If eventually the optimizer becomes commonly used, we will add it to the Keras API.

IMvision12 commented 1 month ago

@fchollet

here i have implemented the optimizer using keras : https://github.com/IMvision12/AdEMAMix-Optimizer-Keras

IMvision12 commented 1 month ago

If we need to add this optimizer in the future, I'd be eager to integrate it into Keras.