microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k stars 346 forks source link

Convert to iteration based training supported by pretraining scripts #393

Closed zainsarwar865 closed 5 months ago

zainsarwar865 commented 6 months ago

Some scripts in the Megatron-DeepSpeed/examples_deepspeed/MoE folder are not runnable by default due to misconfigurations in the bash script. For instance, the ds_pretrain_gpt_125M_dense_cl.sh script is set to use sample-based training but the pretrain_gpt.py only supports iteration-based training. So I just changed the config file to make it compatible by converting it to use a iteration-based training schedule.

There are a bunch of other issues with the examples and I've observed a bit of a mismatch between the tutorials and the actual code linked from it. Since I'm working on Megatron-Deepseed these days, I'd like to keep making a log and hopefully propose some changes to improve the usability of this amazing framework.