Schedule free optimizer (especially schedule-free SGD) offers increased convergence over the existing implementation with no additional memory requirements. There's some variability on the end gradients but this is an equitable engineering tradeoff in many cases. In the same vein, sophia has been shown to converge twice as fast as adam on some language modeling tasks.
Your contribution
These would add in additional dependencies if included straight on, I'm not sure what the typical approach for this is. I'd be willing to implement both features but I'm unsure how the addition of dependencies should go.
Feature request
See https://github.com/facebookresearch/schedule_free and https://github.com/Liuhong99/Sophia -- These optimizers have very different properties and are often useful over the existing choices.
Motivation
Schedule free optimizer (especially schedule-free SGD) offers increased convergence over the existing implementation with no additional memory requirements. There's some variability on the end gradients but this is an equitable engineering tradeoff in many cases. In the same vein, sophia has been shown to converge twice as fast as adam on some language modeling tasks.
Your contribution
These would add in additional dependencies if included straight on, I'm not sure what the typical approach for this is. I'd be willing to implement both features but I'm unsure how the addition of dependencies should go.