jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.43k stars 148 forks source link

Where is LOMO (fused gradient update) implemented? #37

Closed gaotianyu1350 closed 7 months ago

gaotianyu1350 commented 7 months ago

Hi! Congrats on the great work! I have a question regarding the gradient storage: the paper mentioned that GaLore also uses LOMO to avoid materializing the full gradient, but I couldn't find where LOMO is implemented in the code base. Can you point me to where it is implemented (or the equivalents)? Thanks!

gaotianyu1350 commented 7 months ago

Oh sorry I think I found it!! I guess it's here: https://github.com/jiaweizzhao/GaLore/blob/1b36c33782bdd74a4d6a4f51bc626ef67f51011f/torchrun_main.py#L367C1-L368C1.