Qwen1.5-MoE Support

With the increasing attention on mixture-of-experts (MoE) models, especially following the advancements heralded by Mixtral, I propose considering the integration of the Qwen1.5-MoE architecture, particularly its A2.7B variant, into the DeepSpeed-MII framework. This model presents an efficient and potent approach to deploying MoE mechanisms within large-scale language models, offering a promising avenue for enhancing model performance while optimizing resource usage.

The Qwen1.5-MoE-A2.7B model demonstrates how it's possible to match the capabilities of leading 7B models with significantly fewer parameters—about 2.7 billion activated parameters. This efficiency does not only imply a reduction in size but also paves the way for substantial savings in training costs and an improvement in inference speeds, making it an ideal candidate for integration into DeepSpeed-MII to leverage these advantages at scale.

This suggestion seeks to open a discussion on the feasibility, potential benefits, and implementation considerations for supporting the Qwen1.5-MoE architecture within DeepSpeed-MII, aiming to further the toolkit's capabilities in handling advanced MoE models efficiently.

Blog post - https://qwenlm.github.io/blog/qwen-moe/ Model - https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B Code

import mii
import time

pipe = mii.pipeline("Qwen/Qwen1.5-MoE-A2.7B")

microsoft / DeepSpeed-MII

[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII #457

Qwen1.5-MoE Support