jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

Reproducing Perplexity evaluation #30

Open NitzanHod opened 3 months ago

NitzanHod commented 3 months ago

How exactly did you measure Perplexity during pre-training with GaLore? (e.g. when creating Figure 5 in your paper https://arxiv.org/pdf/2403.03507.pdf ). Thanks.

Zeju1997 commented 3 months ago

Also very interested. Can you please provide more details about that?

jiaweizzhao commented 3 months ago

The perplexity is measured by taking exp(total_loss), where it is computed by the evaluation function.