microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k stars 345 forks source link

llama3 and llama3.1 support #443

Open fmiao2372 opened 2 months ago

fmiao2372 commented 2 months ago

When Megatron-DeepSpeed support llama3/llama3.1 pretraining?

busishengui commented 2 weeks ago

llama3.1 and llama3 is similar to llama2, so you don't need change your code