I am currently implementing the ADOPT into my work codebase and it would be nice if unsloth could use it also. Based on the paper, it does outperform Adam (but this time for real). It is based on Taniguchi, Shohei, et al. "ADOPT: Modified Adam Can Converge with Any Beta2 with the Optimal Rate." ArXiv, 2024, https://arxiv.org/abs/2411.02853
I can work on this in a few days once I get some free time.
I am currently implementing the ADOPT into my work codebase and it would be nice if unsloth could use it also. Based on the paper, it does outperform Adam (but this time for real). It is based on Taniguchi, Shohei, et al. "ADOPT: Modified Adam Can Converge with Any Beta2 with the Optimal Rate." ArXiv, 2024, https://arxiv.org/abs/2411.02853
I can work on this in a few days once I get some free time.
Proposed Changes