xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.4k stars 438 forks source link

启动qwen2.5 14B vllm int8 版本ValueError: [address=0.0.0.0:37961, pid=352437] Marlin does not support weight_bits = uint8b128. Only types = [] are supported (for group_size = 128, min_capability = 70, zp = False) #2350

Closed dingidng closed 1 month ago

dingidng commented 1 month ago

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

image

启动qwen2.5 14B vllm int8 版本ValueError: [address=0.0.0.0:37961, pid=352437] Marlin does not support weight_bits = uint8b128. Only types = [] are supported (for group_size = 128, min_capability = 70, zp = False)

硬件是nvidia v100

Reproduction / 复现过程

image

Expected behavior / 期待表现

修复bug

qinxuye commented 1 month ago

试下加额外选项:quantization,值 GPTQ

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 5 days since being marked as stale.