Open NitzanHod opened 3 months ago
How exactly did you measure Perplexity during pre-training with GaLore? (e.g. when creating Figure 5 in your paper https://arxiv.org/pdf/2403.03507.pdf ). Thanks.
Also very interested. Can you please provide more details about that?
The perplexity is measured by taking exp(total_loss), where it is computed by the evaluation function.
How exactly did you measure Perplexity during pre-training with GaLore? (e.g. when creating Figure 5 in your paper https://arxiv.org/pdf/2403.03507.pdf ). Thanks.