microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
710 stars 85 forks source link

Pretrained MoE model #179

Open Luodian opened 2 years ago

Luodian commented 2 years ago

hi thanks for providing such a wonderful work. However, I am curious that will you consider providing pretrained MoE models (e.g. ViT on ImageNet or machine translation tasks)

ghostplant commented 2 years ago

I think [SWIN Transformer] (https://github.com/microsoft/Swin-Transformer) would provide such pretrained MoE model based on Tutel.

For other language models over Fairseq, this repo currently only provides scripts that train models from scratch, since we don't really get the requirement from customers who would like to use certain pretrained models at what scale based on what datasets. But thanks for your suggestion, this is something meaningful and we'll discuss more about it.

zeliu98 commented 2 years ago

Hi @Luodian, the pretrained models can be found in: https://github.com/microsoft/Swin-Transformer/blob/main/MODELHUB.md#imagenet-22k-pretrained-swin-moe-models. We have also provided an instruction on how to run Swin-MoE, which can be found in: https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md#mixture-of-experts-support