mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

Gradient clipping not working for llama2_70b_lora benchmark #723

Open michal2409 opened 3 months ago

michal2409 commented 3 months ago

I’ve found that setting max_grad_norm has no effect, and we are not clipping gradients.

For verification, I ran convergence with max_grad_norm 1e-9 and saw no difference in eval loss, and checked the unscale_and_clip_grads and the self.clip_grad is set to 0 when I printed it here.

nv-rborkar commented 3 months ago

Discussed in Training WG (3/28): @itayhubara is verifying if setting this value correctly affect convergence & if this can improve convergence or reduce coefficienct of variance in RCPs.