xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.99k stars 396 forks source link

BUG Unable to load model #921

Closed auxpd closed 1 month ago

auxpd commented 8 months ago

Describe the bug

I was unsuccessful in loading the model with the following parameters, and the latest version of xinference is 0.8.1

I started qwen-chat on the ui pagemodel format: gptq model size: 14 quantization: int4 n-gpu: 2

console error log: File "/home/auxpd/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/model_executor/models/qwen.py", line 231, in init self.transformer = QWenModel(config, linear_method) File "/home/auxpd/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/model_executor/models/qwen.py", line 193, in init self.h = nn.ModuleList([ File "/home/auxpd/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/model_executor/models/qwen.py", line 194, in QWenBlock(config, linear_method) File "/home/auxpd/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/model_executor/models/qwen.py", line 147, in init self.mlp = QWenMLP(config.hidden_size, File "/home/auxpd/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/model_executor/models/qwen.py", line 49, in init self.c_proj = RowParallelLinear(intermediate_size, File "/home/auxpd/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 495, in init self.linear_weights = self.linear_method.create_weights( File "/home/auxpd/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/gptq.py", line 100, in create_weights raise ValueError( ValueError: [address=0.0.0.0:39287, pid=33034] The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.

qinxuye commented 8 months ago

The image seems lost.

auxpd commented 8 months ago

I've turned it into text, lol.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 5 days since being marked as stale.