microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.89k stars 344 forks source link

llama3 and llama3.1 support #443

Open fmiao2372 opened 2 months ago

fmiao2372 commented 2 months ago

When Megatron-DeepSpeed support llama3/llama3.1 pretraining?

busishengui commented 4 days ago

llama3.1 and llama3 is similar to llama2, so you don't need change your code