What is the difference between this and deepspeed-moe?

microsoft / Tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

MIT License

724 stars 93 forks source link

What is the difference between this and deepspeed-moe? #213

Closed Hap-Zhang closed 1 year ago

Hap-Zhang commented 1 year ago

Hi, I found out that Microsoft has another project named deepspeed-moe(https://www.deepspeed.ai/tutorials/mixture-of-experts/) that supports moe, and is there any difference in the focus of these two projects?

ghostplant commented 1 year ago

This project is not bonded to DeepSpeed, so it is also compatible for other language models and frameworks not depending on DeepSpeed (e.g. SWIN, Fairseq, etc).

In the meanwhile, DeepSpeed's Top-1 gating can be also boosted if you have Tutel project installed in your environment: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/moe/sharded_moe.py#L46

But that would just benefit from part of Tutel's kernel optimizations and new features since Tutel >= 0.2.x would not be leveraged.

Hap-Zhang commented 1 year ago

Ok, I see. Thank you very much for your reply.