microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.76k stars 163 forks source link

[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII #457

Open freQuensy23-coder opened 3 months ago

freQuensy23-coder commented 3 months ago

Qwen1.5-MoE Support

With the increasing attention on mixture-of-experts (MoE) models, especially following the advancements heralded by Mixtral, I propose considering the integration of the Qwen1.5-MoE architecture, particularly its A2.7B variant, into the DeepSpeed-MII framework. This model presents an efficient and potent approach to deploying MoE mechanisms within large-scale language models, offering a promising avenue for enhancing model performance while optimizing resource usage.

The Qwen1.5-MoE-A2.7B model demonstrates how it's possible to match the capabilities of leading 7B models with significantly fewer parameters—about 2.7 billion activated parameters. This efficiency does not only imply a reduction in size but also paves the way for substantial savings in training costs and an improvement in inference speeds, making it an ideal candidate for integration into DeepSpeed-MII to leverage these advantages at scale.

This suggestion seeks to open a discussion on the feasibility, potential benefits, and implementation considerations for supporting the Qwen1.5-MoE architecture within DeepSpeed-MII, aiming to further the toolkit's capabilities in handling advanced MoE models efficiently.

Blog post - https://qwenlm.github.io/blog/qwen-moe/ Model - https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B Code

import mii
import time

pipe = mii.pipeline("Qwen/Qwen1.5-MoE-A2.7B")
ZonePG commented 2 months ago

Hi @freQuensy23-coder, I submitted a PR to DeepSpeed to support Qwen1.5-MoE, and it's now waiting for deepspeed repo merge. before that, you can build deepspeed manually from my source code.

https://github.com/microsoft/DeepSpeed/pull/5403