vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.62k stars 3.9k forks source link

[New Model]: launch error of Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 #4331

Open eigen2017 opened 4 months ago

eigen2017 commented 4 months ago

The model to consider.

https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4/

The closest model vllm already supports.

https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat/

What's your difficulty of supporting the model you want?

launch error: MergedColumnParallelLinear object has no attribute weight

jeejeelee commented 4 months ago

Currently, vllm does not support the qwen-moe quantization

wellcasa commented 4 months ago

Kneeling for support, qwen moe.

li904775857 commented 3 months ago

Kneeling for support, qwen moe.