启动qwen2.5 14B vllm int8 版本ValueError: [address=0.0.0.0:37961, pid=352437] Marlin does not support weight_bits = uint8b128. Only types = [] are supported (for group_size = 128, min_capability = 70, zp = False)

dingidng commented 1 month ago

System Info / 系統信息

xinference 0.15.2
torch 2.4.0 torch-complex 0.4.4 torchaudio 2.4.0 torchmetrics 1.4.1 torchvision 0.19.0
python 3.11.9
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

xinference 0.15.2

The command used to start Xinference / 用以启动 xinference 的命令

启动qwen2.5 14B vllm int8 版本ValueError: [address=0.0.0.0:37961, pid=352437] Marlin does not support weight_bits = uint8b128. Only types = [] are supported (for group_size = 128, min_capability = 70, zp = False)

硬件是nvidia v100

Reproduction / 复现过程

Expected behavior / 期待表现

修复bug

qinxuye commented 1 month ago

试下加额外选项：quantization，值 GPTQ

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 5 days since being marked as stale.

xorbitsai / inference