xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.11k stars 413 forks source link

qwen1.5-moe-chat模型加载失败 #1906

Open li1553770945 opened 3 months ago

li1553770945 commented 3 months ago

System Info / 系統信息

Python: Python 3.10.14

os:

DISTRIB_ID=Kylin
DISTRIB_RELEASE=V10
DISTRIB_CODENAME=kylin
DISTRIB_DESCRIPTION="Kylin V10 SP1"
DISTRIB_KYLIN_RELEASE=V10
DISTRIB_VERSION_TYPE=enterprise
DISTRIB_VERSION_MODE=normal

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

xinference, version 0.13.2

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

使用命令xinference launch --model-engine Transformers --model-name qwen1.5-moe-chat --size-in-billions 2_7 --model-format pytorch --quantization 8-bit,或者在网页ui上部署模型qwen1.5-moe-chat。

Expected behavior / 期待表现

能够正常加载模型。

以下是完整报错traceback:

2024-07-20 14:53:19,651 xinference.api.restful_api 738440 ERROR    [address=0.0.0.0:38955, pid=739317] 'model.embed_tokens.weight'
Traceback (most recent call last):
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 847, in launch_model
    model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
    return self._process_result_message(result)
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 656, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 367, in _run_coro
    return await coro
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 988, in launch_builtin_model
    await _launch_model()
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 952, in _launch_model
    await _launch_one_model(rep_model_uid)
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 932, in _launch_one_model
    await worker_ref.launch_builtin_model(
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
    result = await result
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 841, in launch_builtin_model
    await model_ref.load()
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
    return self._process_result_message(result)
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 656, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 367, in _run_coro
    return await coro
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 295, in load
    self._model.load()
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/pytorch/core.py", line 768, in load
    super().load()
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/pytorch/core.py", line 309, in load
    ) = load_compress_model(
  File "/home/llm/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/pytorch/compression.py", line 163, in load_compress_model
    model, name, device, value=compressed_state_dict[name]
KeyError: [address=0.0.0.0:38955, pid=739317] 'model.embed_tokens.weight'
li1553770945 commented 3 months ago

llm/pytorch/compression.py:133中,有如下代码:

if os.path.exists(model_path):
        # `model_path` is a local folder
        base_pattern = os.path.join(model_path, "pytorch_model*.bin")
  else:
      # `model_path` is a cached Hugging Face repo
      model_path = snapshot_download(model_path, revision=revision)
      base_pattern = os.path.join(model_path, "pytorch_model*.bin")

可以看出试图加载的模型必须名字叫做pytorch_model*.bin。但是实际查看model_path,对于qwen1.5-moe-chat模型而言,其实际保存的文件格式是safetensors,因此导致了模型加载失败。 image

qinxuye commented 3 months ago

可以尝试修复吗?

Leiyanzangxiangsi commented 2 months ago

遇到了相同的问题,没有一个人能解决吗

qinxuye commented 2 months ago
image

我跑下来一切正常。

Name: transformers
Version: 4.43.2
Name: bitsandbytes
Version: 0.43.0

这两个库看下对齐是否正常。

li1553770945 commented 2 months ago
image

我跑下来一切正常。

Name: transformers
Version: 4.43.2
Name: bitsandbytes
Version: 0.43.0

这两个库看下对齐是否正常。

你下载的模型文件是什么样的呢,我下载下来都是一些safetensor格式的,但是我用的那个版本的xinference加载模型的时候只能加载bin,所以出现了这个问题。

Leiyanzangxiangsi commented 2 months ago
image

我跑下来一切正常。

Name: transformers
Version: 4.43.2
Name: bitsandbytes
Version: 0.43.0

这两个库看下对齐是否正常。

你下载的模型文件是什么样的呢,我下载下来都是一些safetensor格式的,但是我用的那个版本的xinference加载模型的时候只能加载bin,所以出现了这个问题。

下载的pytorch模型都是safetensors格式,全部都不能成功加载

Leiyanzangxiangsi commented 2 months ago

应该是quantization导致的,quantization选择none应该没问题