Open ShayDuane opened 8 months ago
@ShayDuane we do not support Mixtral with the old inference interface. Please use DeepSpeed-MII to get support for the Mixtral model.
First install latest DeepSpeed and MII:
pip install deepspeed==0.12.6 deepspeed-mii==0.1.3
Then launch the following with deepspeed --num_gpus 4 mixtral.py
:
import mii
pipe = mii.pipeline("/workspace/shuaiqi/Model/Mixtral")
responses = pipe("DeepSpeed is", max_new_tokens=128, return_full_text=True)
if pipe.is_rank_0:
print(responses[0])
@mrwyattii , will you add support Mixtral with the old inference interface in the future ? as it's quite a popular model.
hi, have DeepSpeed supported Mixtral(or other Moe models) with the old inference interface ? I tried to inference the Moe model(mixtral and Qwen1.5-MoE-A2.7B) with DeepSpeed in multi-Node, but it failed. Can anyone help me?
Describe the bug I'm not sure if DeepSpeed needs to be adapted for Mixtral. When I tried using DeepSpeed inference for model inference, it didn't properly implement model parallelism. Instead, it attempted to load the complete model parameters on each GPU, which ultimately led to Out Of Memory (OOM) errors. However, when I use llama2 for inference, it does indeed implement model parallelism. So I'm wondering if the Mixtral model requires official adaptation? Additionally, when I deploy Mixtral using the MII library, model parallelism is also successfully implemented, with the model parameters correctly split across different GPUs. But when I directly use DeepSpeed inference, it fails. I'm not sure if it's because it requires official adaptation or if there's a problem with how I'm using it. Is there anyone who can provide some guidance?
To Reproduce
The launch command is
Expected behavior A clear and concise description of what you expected to happen.
ds_report output
Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information): transformers 4.36.2 cuda 12.3 pytorch 2.1.2 deepspeed 0.12.6
Docker context Are you using a specific docker image that you can share?
Additional context Add any other context about the problem here.