xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.13k stars 416 forks source link

BUG Failed to load models #1031

Closed JTed1997 closed 8 months ago

JTed1997 commented 8 months ago

Describe the bug

When i try to download and run qwen-chat-1-0 , it reports ERROR Failed to load model qwen-chat-1-0. I have tried three models , all report the same error.

To Reproduce

2024-02-22 23:48:31,901 xinference.model.llm.llm_family 19432 INFO Caching from Modelscope: qwen/Qwen-1_8B-Chat 2024-02-22 23:48:32,306 - modelscope - INFO - Use user-specified model revision: v1.0.0 Downloading: 100%|███████████████████████████████████████████████████████| 8.21k/8.21k [00:00<00:00, 11.1MB/s] Downloading: 100%|████████████████████████████████████████████████████████| 50.8k/50.8k [00:00<00:00, 507kB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 244k/244k [00:00<00:00, 1.23MB/s] Downloading: 100%|██████████████████████████████████████████████████████████| 135k/135k [00:00<00:00, 879kB/s] Downloading: 100%|███████████████████████████████████████████████████████████| 910/910 [00:00<00:00, 3.75MB/s] Downloading: 100%|██████████████████████████████████████████████████████████| 77.0/77.0 [00:00<00:00, 328kB/s] Downloading: 100%|███████████████████████████████████████████████████████| 2.29k/2.29k [00:00<00:00, 9.16MB/s] Downloading: 100%|███████████████████████████████████████████████████████| 1.88k/1.88k [00:00<00:00, 14.7MB/s] Downloading: 100%|███████████████████████████████████████████████████████████| 249/249 [00:00<00:00, 1.02MB/s] Downloading: 100%|███████████████████████████████████████████████████████| 1.63M/1.63M [00:00<00:00, 4.62MB/s] Downloading: 100%|███████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 4.97MB/s] Downloading: 100%|███████████████████████████████████████████████████████| 2.64M/2.64M [00:01<00:00, 1.69MB/s] Downloading: 100%|███████████████████████████████████████████████████████| 7.11k/7.11k [00:00<00:00, 65.4kB/s] Downloading: 100%|████████████████████████████████████████████████████████| 80.8k/80.8k [00:00<00:00, 829kB/s] Downloading: 100%|████████████████████████████████████████████████████████| 80.8k/80.8k [00:00<00:00, 869kB/s] Downloading: 100%|██████████████████████████████████████████████████████▉| 1.90G/1.90G [00:50<00:00, 40.2MB/s] Downloading: 100%|██████████████████████████████████████████████████████▉| 1.52G/1.52G [00:40<00:00, 39.9MB/s] Downloading: 100%|████████████████████████████████████████████████████████| 14.4k/14.4k [00:00<00:00, 321kB/s] Downloading: 100%|████████████████████████████████████████████████████████| 54.3k/54.3k [00:00<00:00, 540kB/s] Downloading: 100%|████████████████████████████████████████████████████████| 15.0k/15.0k [00:00<00:00, 343kB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 237k/237k [00:00<00:00, 1.19MB/s] Downloading: 100%|██████████████████████████████████████████████████████████| 116k/116k [00:00<00:00, 750kB/s] Downloading: 100%|███████████████████████████████████████████████████████| 2.44M/2.44M [00:00<00:00, 5.90MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 473k/473k [00:00<00:00, 1.84MB/s] Downloading: 100%|████████████████████████████████████████████████████████| 14.3k/14.3k [00:00<00:00, 313kB/s] Downloading: 100%|████████████████████████████████████████████████████████| 79.0k/79.0k [00:00<00:00, 853kB/s] Downloading: 100%|████████████████████████████████████████████████████████| 46.4k/46.4k [00:00<00:00, 495kB/s] Downloading: 100%|███████████████████████████████████████████████████████| 0.98M/0.98M [00:00<00:00, 3.31MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 205k/205k [00:00<00:00, 1.07MB/s] Downloading: 100%|████████████████████████████████████████████████████████| 19.4k/19.4k [00:00<00:00, 416kB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 302k/302k [00:00<00:00, 1.51MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 615k/615k [00:00<00:00, 2.36MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 376k/376k [00:00<00:00, 1.85MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 445k/445k [00:00<00:00, 1.85MB/s] Downloading: 100%|████████████████████████████████████████████████████████| 25.9k/25.9k [00:00<00:00, 526kB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 395k/395k [00:00<00:00, 2.01MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 176k/176k [00:00<00:00, 1.20MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 182k/182k [00:00<00:00, 1.19MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 824k/824k [00:00<00:00, 3.27MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 426k/426k [00:00<00:00, 1.71MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 433k/433k [00:00<00:00, 1.83MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 1.85MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 403k/403k [00:00<00:00, 2.00MB/s] Downloading: 100%|███████████████████████████████████████████████████████| 9.39k/9.39k [00:00<00:00, 17.1MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 403k/403k [00:00<00:00, 2.03MB/s] Downloading: 100%|████████████████████████████████████████████████████████| 79.0k/79.0k [00:00<00:00, 783kB/s] Downloading: 100%|███████████████████████████████████████████████████████████| 173/173 [00:00<00:00, 1.10MB/s] Downloading: 100%|████████████████████████████████████████████████████████| 41.9k/41.9k [00:00<00:00, 413kB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 230k/230k [00:00<00:00, 1.19MB/s] Downloading: 100%|███████████████████████████████████████████████████████| 1.27M/1.27M [00:00<00:00, 4.50MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 664k/664k [00:00<00:00, 2.62MB/s] Downloading: 100%|█████████████████████████████████████████████████████████| 404k/404k [00:00<00:00, 1.63MB/s] 0it [00:00, ?it/s] 2024-02-22 23:50:33,698 xinference.core.worker 19432 ERROR Failed to load model qwen-chat-1-0 Traceback (most recent call last): File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/core/worker.py", line 549, in launch_builtin_model await model_ref.load() File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/pool.py", line 657, in send result = await self._run_coro(message.message_id, coro) ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/pool.py", line 368, in _run_coro return await coro File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.__on_receive result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/core/model.py", line 239, in load self._model.load() ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/model/llm/pytorch/core.py", line 188, in load self._model, self._tokenizer = load_compress_model( ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/model/llm/pytorch/compression.py", line 163, in load_compress_model model, name, device, value=compressed_state_dict[name] ^^^^^^^^^^^^^^^^^ KeyError: [address=127.0.0.1:51605, pid=19656] 'transformer.wte.weight' 2024-02-22 23:50:33,714 xinference.api.restful_api 19427 ERROR [address=127.0.0.1:51605, pid=19656] 'transformer.wte.weight' Traceback (most recent call last): File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/api/restful_api.py", line 678, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/pool.py", line 657, in send result = await self._run_coro(message.message_id, coro) ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/pool.py", line 368, in _run_coro return await coro File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/core/supervisor.py", line 797, in launch_builtin_model await _launch_model() ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/core/supervisor.py", line 761, in _launch_model await _launch_one_model(rep_model_uid) ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/core/supervisor.py", line 745, in _launch_one_model await worker_ref.launch_builtin_model( ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 284, in pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.pyx_actor_method_wrapper result = await result ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/core/worker.py", line 549, in launch_builtin_model await model_ref.load() ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/pool.py", line 657, in send result = await self._run_coro(message.message_id, coro) ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/backends/pool.py", line 368, in _run_coro return await coro File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.__on_receive__ result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/core/model.py", line 239, in load self._model.load() ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/model/llm/pytorch/core.py", line 188, in load self._model, self._tokenizer = load_compress_model( ^^^^^^^^^^^^^^^^^ File "/Users/apple/anaconda3/envs/llm_test/lib/python3.11/site-packages/xinference/model/llm/pytorch/compression.py", line 163, in load_compress_model model, name, device, value=compressed_state_dict[name] ^^^^^^^^^^^^^^^^^ KeyError: [address=127.0.0.1:51605, pid=19656] 'transformer.wte.weight'

  1. Python version =3.11.7
  2. version of xinference=v0.9.0.
  3. Device: Macbookair M2

Expected behavior

Please help me to solve it.

ChengjieLi28 commented 8 months ago

Hi @JTed1997 .I tried qwen-chat 1.8B on my local apple M1 laptop with these configs:

tranformers: 4.32.1
qwen-chat 1.8B
quantization: none
n_gpu: auto

It works normally.

qqlww1987 commented 5 months ago

how to fix?