xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5k stars 396 forks source link

windows环境启动qwen2-instruct报错KeyError #1739

Closed BarryCui closed 2 months ago

BarryCui commented 3 months ago

windows环境启动qwen2-instruct报错KeyError, 环境是win10,python3.11.9 qwen2-instruct启动参数是Transformers+pytorch+model size 72+quantization 8-bit。

报错详细信息如下: 2024-06-28 15:39:55,950 xinference.api.restful_api 17344 ERROR [address=10.0.40.107:56307, pid=19364] 'model.embed_tokens.weight' Traceback (most recent call last): File "C:\AI\xinference\venv\Lib\site-packages\xinference\api\restful_api.py", line 771, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xoscar\backends\context.py", line 227, in send return self._process_result_message(result) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xoscar\backends\context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "C:\AI\xinference\venv\Lib\site-packages\xoscar\backends\pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xoscar\backends\pool.py", line 370, in _run_coro return await coro File "C:\AI\xinference\venv\Lib\site-packages\xoscar\api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore ^^^^^^^^^^^^^^^^^ File "xoscar\core.pyx", line 558, in on_receive__ raise ex File "xoscar\core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: ^^^^^^^^^^^^^^^^^ File "xoscar\core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', ^^^^^^^^^^^^^^^^^ File "xoscar\core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xinference\core\supervisor.py", line 837, in launch_builtin_model await _launch_model() ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xinference\core\supervisor.py", line 801, in _launch_model await _launch_one_model(rep_model_uid) ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xinference\core\supervisor.py", line 782, in _launch_one_model await worker_ref.launch_builtin_model( ^^^^^^^^^^^^^^^^^ File "xoscar\core.pyx", line 284, in pyx_actor_method_wrapper async with lock: File "xoscar\core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xinference\core\utils.py", line 45, in wrapped ret = await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xinference\core\worker.py", line 665, in launch_builtin_model await model_ref.load() ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xoscar\backends\context.py", line 227, in send return self._process_result_message(result) ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xoscar\backends\context.py", line 102, in _process_result_message raise message.as_instanceof_cause() ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xoscar\backends\pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xoscar\backends\pool.py", line 370, in _run_coro return await coro File "C:\AI\xinference\venv\Lib\site-packages\xoscar\api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore ^^^^^^^^^^^^^^^^^ File "xoscar\core.pyx", line 558, in on_receive raise ex File "xoscar\core.pyx", line 520, in xoscar.core._BaseActor.__on_receive async with self._lock: ^^^^^^^^^^^^^^^^^ File "xoscar\core.pyx", line 521, in xoscar.core._BaseActor.on_receive__ with debug_async_timeout('actor_lock_timeout', ^^^^^^^^^^^^^^^^^ File "xoscar\core.pyx", line 526, in xoscar.core._BaseActor.on_receive__ result = await result ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xinference\core\model.py", line 278, in load self._model.load() ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xinference\model\llm\pytorch\core.py", line 617, in load super().load() File "C:\AI\xinference\venv\Lib\site-packages\xinference\model\llm\pytorch\core.py", line 249, in load self._model, self._tokenizer = load_compress_model( ^^^^^^^^^^^^^^^^^ File "C:\AI\xinference\venv\Lib\site-packages\xinference\model\llm\pytorch\compression.py", line 163, in load_compress_model model, name, device, value=compressed_state_dict[name] ^^^^^^^^^^^^^^^^^ KeyError: [address=10.0.40.107:56307, pid=19364] 'model.embed_tokens.weight'

okwinds commented 3 months ago

try python 3.10

geniusatm4 commented 2 months ago

后来有解决吗?我也碰到一样的情况。Python 3.10.14、cuda 11.7、pytorch2.0.1+cu117

fengyunzaidushi commented 2 months ago

same problem: env:

conda python3.10

Zxy1414 commented 2 months ago

量化模型时报错 求解决

eavin7456 commented 1 month ago

我也报错了 你解决了吗

xumeivs commented 1 month ago

我也报错了 你解决了吗

换ollama吧

qinxuye commented 1 month ago

Transformers 是什么版本?

qinxuye commented 1 month ago

测试了下

xinference launch --model-engine transformers --model-name qwen2-instruct --size-in-billions 72 --model-format pytorch --quantization 8-bit

没有碰到问题,相关版本如下:

transformers                      4.43.4
transformers-stream-generator     0.0.5
bitsandbytes                      0.43.3