Closed IMvision12 closed 1 month ago
Thanks for the suggestion. I see the paper has a total of 0 citations listed on ArXiv. As a general rule we wait to see >50 citations before including a technique in Keras. As per API guidelines: "We only add new objects that are already commonly used in the machine learning community"
Now, if you want to build this optimizer, you can do so in your own repo, and then we can share it with the community to see if people adopt it. If eventually the optimizer becomes commonly used, we will add it to the Keras API.
@fchollet
here i have implemented the optimizer using keras : https://github.com/IMvision12/AdEMAMix-Optimizer-Keras
If we need to add this optimizer in the future, I'd be eager to integrate it into Keras.
AdEMAMix integrates Adam and EMA optimization methods to tackle issues of slow convergence and subpar generalization in large language models and noisy datasets. It utilizes three beta parameters along with an alpha parameter to provide flexible momentum and adaptive learning rates.
Paper: https://arxiv.org/abs/2409.03137
I'm interested in adding this optimizer to Keras.
@fchollet