CoffeeVampir3 commented 6 months ago

Feature request

See https://github.com/facebookresearch/schedule_free and https://github.com/Liuhong99/Sophia -- These optimizers have very different properties and are often useful over the existing choices.

Motivation

Schedule free optimizer (especially schedule-free SGD) offers increased convergence over the existing implementation with no additional memory requirements. There's some variability on the end gradients but this is an equitable engineering tradeoff in many cases. In the same vein, sophia has been shown to converge twice as fast as adam on some language modeling tasks.

Your contribution

These would add in additional dependencies if included straight on, I'm not sure what the typical approach for this is. I'd be willing to implement both features but I'm unsure how the addition of dependencies should go.

vasqu commented 6 months ago

30079 and #24338 are relevant, not sure if the policy around optimizers has changed but you could try out the branches if it's not merged.

amyeroberts commented 6 months ago

cc @younesbelkada

huggingface / transformers

Schedule Free Optimizers (pytorch) and Sophia optimizer #30359

Feature request

Motivation

Your contribution

30079 and #24338 are relevant, not sure if the policy around optimizers has changed but you could try out the branches if it's not merged.