jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.43k stars 148 forks source link

Results vs FP32 #59

Open tsengalb99 opened 3 months ago

tsengalb99 commented 3 months ago

Hi, I was reading the GaLore paper and noticed that the "ground truth" baseline seems to be pure BF16 training with nearest rounding. It is generally accepted that pure BF16 training with nearest rounding does not converge to the same point as FP32 or BF16/FP32 mixed precision training -- does GaLore only match pure BF16 or does it match FP32 training as well?