microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.76k stars 334 forks source link

[BUG] Problems with Mixture-of-Experts (MoE) #367

Open nikit-srivastava opened 4 months ago

nikit-srivastava commented 4 months ago

Hello,

Thank you for the nice work with this training framework. However, I have noticed that there's a problem with inference, conversion and fine-tuning of MoE based GPT model. The following is a list of issues that point the same but have not been yet addressed:

In general, the inference example (generate_text.sh) does not work when --num-experts is set to a value higher than 1. Also, the conversion scripts (convert_checkpoint) are not equipped to handle MoE models.

I would like to request the attention of repository maintainers to this issue. Personally, this issue is being a big roadblock in our research and prevents us from analyzing or publishing our findings. We would be really grateful if this can be resolved soon.

If you need any other information or access to model weights to test, please feel free to ask. With my current knowledge, I can also offer to fix/implement features if you point me in the right direction.

suu990901 commented 2 months ago

Have you solved this problem?