Closed Hap-Zhang closed 1 year ago
This project is not bonded to DeepSpeed, so it is also compatible for other language models and frameworks not depending on DeepSpeed (e.g. SWIN, Fairseq, etc).
In the meanwhile, DeepSpeed's Top-1 gating can be also boosted if you have Tutel project installed in your environment: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/moe/sharded_moe.py#L46
But that would just benefit from part of Tutel's kernel optimizations and new features since Tutel >= 0.2.x would not be leveraged.
Ok, I see. Thank you very much for your reply.
Hi, I found out that Microsoft has another project named deepspeed-moe(https://www.deepspeed.ai/tutorials/mixture-of-experts/) that supports moe, and is there any difference in the focus of these two projects?