xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.34k stars 431 forks source link

安装qwen-chat模型报错:cannot unpack non-iterable NoneType object #2327

Closed cyflhn closed 1 month ago

cyflhn commented 1 month ago

System Info / 系統信息

python version: 3.10 操作系统:centos7.9 无cuda

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

0.15.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

1.进入ui选择安装qwen-chat模型 2.模型下载成功后,报错: xinference.core.worker 15493 ERROR [request 184fd182-75a1-11ef-8560-080027b9b12b] Leave launch_builtin_model, error: [address=0.0.0.0:36481, pid=15670] cannot unpack non-iterable NoneType object, elapsed time: 13 s Traceback (most recent call last): File "/opt/python3/lib/python3.10/site-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, **kwargs) File "/opt/python3/lib/python3.10/site-packages/xinference/core/worker.py", line 893, in launch_builtin_model await model_ref.load() File "/opt/python3/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/opt/python3/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/opt/python3/lib/python3.10/site-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/opt/python3/lib/python3.10/site-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/opt/python3/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__ return await super().__on_receive__(message) # type: ignore File "xoscar/core.pyx", line 558, in __on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__ async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__ with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__ result = await result File "/opt/python3/lib/python3.10/site-packages/xinference/core/model.py", line 309, in load self._model.load() File "/opt/python3/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 248, in load self._engine = AsyncLLMEngine.from_engine_args(engine_args) File "/opt/python3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 726, in from_engine_args engine_config = engine_args.create_engine_config() File "/opt/python3/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 844, in create_engine_config model_config = self.create_model_config() File "/opt/python3/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 782, in create_model_config return ModelConfig( File "/opt/python3/lib/python3.10/site-packages/vllm/config.py", line 243, in __init__ self._verify_quantization() File "/opt/python3/lib/python3.10/site-packages/vllm/config.py", line 302, in _verify_quantization quantization_override = method.override_quantization_method( File "/opt/python3/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/gptq_marlin.py", line 100, in override_quantization_method can_convert = cls.is_gptq_marlin_compatible(hf_quant_cfg) File "/opt/python3/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/gptq_marlin.py", line 151, in is_gptq_marlin_compatible return check_marlin_supported(quant_type=cls.TYPE_MAP[(num_bits, sym)], File "/opt/python3/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/utils/marlin_utils.py", line 78, in check_marlin_supported cond, _ = _check_marlin_supported(quant_type, group_size, has_zp, File "/opt/python3/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/utils/marlin_utils.py", line 55, in _check_marlin_supported major, minor = current_platform.get_device_capability() TypeError: [address=0.0.0.0:36481, pid=15670] cannot unpack non-iterable NoneType object (vllm是否一定要部署在支持cuda的机器上?这个报错跟无cuda有关吗)

Expected behavior / 期待表现

能正确安装模型

948024326 commented 1 month ago

你好,请问你下载的时候有日志打印吗? 我下载模型一直就是只有个这一行 image

cyflhn commented 1 month ago

你好,请问你下载的时候有日志打印吗? 我下载模型一直就是只有个这一行 image

有日志打印的

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 5 days since being marked as stale.