microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k stars 345 forks source link

collect grad_norm for non pipeline path #370

Open inkcherry opened 8 months ago