issues
search
microsoft
/
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k
stars
345
forks
source link
improve repeat_kv GQA perf
#419
Closed
polisettyvarma
closed
4 months ago