issues
search
microsoft
/
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k
stars
345
forks
source link
collect grad_norm for non pipeline path
#370
Open
inkcherry
opened
8 months ago