Some scripts in the Megatron-DeepSpeed/examples_deepspeed/MoE folder are not runnable by default due to misconfigurations in the bash script. For instance, the ds_pretrain_gpt_125M_dense_cl.sh script is set to use sample-based training but the pretrain_gpt.py only supports iteration-based training. So I just changed the config file to make it compatible by converting it to use a iteration-based training schedule.
There are a bunch of other issues with the examples and I've observed a bit of a mismatch between the tutorials and the actual code linked from it. Since I'm working on Megatron-Deepseed these days, I'd like to keep making a log and hopefully propose some changes to improve the usability of this amazing framework.
Some scripts in the
Megatron-DeepSpeed/examples_deepspeed/MoE
folder are not runnable by default due to misconfigurations in the bash script. For instance, theds_pretrain_gpt_125M_dense_cl.sh
script is set to use sample-based training but thepretrain_gpt.py
only supports iteration-based training. So I just changed the config file to make it compatible by converting it to use a iteration-based training schedule.There are a bunch of other issues with the examples and I've observed a bit of a mismatch between the tutorials and the actual code linked from it. Since I'm working on Megatron-Deepseed these days, I'd like to keep making a log and hopefully propose some changes to improve the usability of this amazing framework.