jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

Questions about Figure 3 in the original paper #42

Open fy817 opened 2 months ago

fy817 commented 2 months ago

1 In the figure, Rank = 1024 and Rank = 512 is very close to the baseline, even better than the baseline. In response, I have the following 2 questions.

  1. Is Rank = 1024 and Rank = 512 steadily better than baseline, or is there some randomness? If it is steadily better than baseline, how can we explain this phenomenon?
  2. Have you ever done an experiment with a very small case of Rank (e.g. n/8、n/16), and how much does this affect the results of the experiment specifically? Looking forward to your early reply. Your support has been invaluable to me.