microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
694 stars 84 forks source link

What is the difference between this and deepspeed-moe? #213

Closed Hap-Zhang closed 12 months ago

Hap-Zhang commented 1 year ago

Hi, I found out that Microsoft has another project named deepspeed-moe(https://www.deepspeed.ai/tutorials/mixture-of-experts/) that supports moe, and is there any difference in the focus of these two projects?

ghostplant commented 12 months ago

This project is not bonded to DeepSpeed, so it is also compatible for other language models and frameworks not depending on DeepSpeed (e.g. SWIN, Fairseq, etc).

In the meanwhile, DeepSpeed's Top-1 gating can be also boosted if you have Tutel project installed in your environment: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/moe/sharded_moe.py#L46

But that would just benefit from part of Tutel's kernel optimizations and new features since Tutel >= 0.2.x would not be leveraged.

Hap-Zhang commented 12 months ago

Ok, I see. Thank you very much for your reply.