microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k stars 345 forks source link

Why pretrain_llama_distributed.sh use pretrain_gpt.py ? #437

Open BrucePeng92 opened 3 months ago

BrucePeng92 commented 3 months ago

Why use the gpt model when training llama2? How to determine whether the model trained by pretrain_llama_distributed.sh is llama or gpt?