Why not reproject the internal Adam states during update_proj_gap?

jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Apache License 2.0

1.43k stars 148 forks source link

Why not reproject the internal Adam states during update_proj_gap? #54

Open liuliu opened 4 months ago

liuliu commented 4 months ago

Hi, great project. After reading the paper and the implementation, I am wondering if it is considered to reproject the Adam internal states (exp_avg, exp_avg_sq) from previous subspace to the new subspace?

liuliu commented 4 months ago

The reproj momentum is mentioned in the FLoRA paper.

jiaweizzhao commented 4 months ago

Hi, thanks for the suggestion. We didn't include reprojection in the paper but will try to implement it in the repo.