xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.56k stars 457 forks source link

使用xinference 1.0.0 能启动并运行qwen2-vl-7b-instruct,但是使用openai-api接口会出现问题,找不到模型qwen2-vl-7b-instruct-0 #2595

Open yuxi9264 opened 5 days ago

yuxi9264 commented 5 days ago

System Info / 系統信息

Cuda版本:12.1 Pytorch版本:2.5.1 操作系统:WIN10 python版本:3.9.11 transformers: 4.46.3

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

xinference 1.0.0

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 127.0.0.1 --port 9997

Reproduction / 复现过程

  1. 在xinference界面启动qwen2-vl-7b-instruct

  2. 导入openai:from openai import OpenAI

  3. openai_base_url = "http://127.0.0.1:9997/v1" client = OpenAI( api_key="EMPTY", base_url=openai_base_url)

  4. page_image_path = "D:\PythonDemo\pdf_to_md_demo\pdf2markdown\output\0.png"

  5. messages2 = [{ "role": "user", "content": [{ "type": "image_url", "image_url": { "url": f"data::image/png;base64,{encode_base64_content_from_local(page_image_path)}", #将图片转为二进制 }, {"type": "text", "text": "图片中的内容是什么?"} }] resp = client.chat.completions.create( messages=messages2, model="qwen2-vl-7b-instruct", temperature=0.2, ) print(resp)

  6. 运行结果 File "D:\PythonDemo\pdf_to_md_demo\pdf2markdown\pdf_to_markdown.py", line 264, in openai_test resp = client.chat.completions.create( ^^^^^^^^^^^^^^^^^^^^ File "D:\PythonDemo\pdf_to_md_demo\venv\Lib\site-packages\openai_utils_utils.py", line 275, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^ File "D:\PythonDemo\pdf_to_md_demo\venv\Lib\site-packages\openai\resources\chat\completions.py", line 829, in create return self._post( ^^^^^^^ File "D:\PythonDemo\pdf_to_md_demo\venv\Lib\site-packages\openai_base_client.py", line 1278, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\PythonDemo\pdf_to_md_demo\venv\Lib\site-packages\openai_base_client.py", line 955, in request return self._request( ^^^^^^^^ File "D:\PythonDemo\pdf_to_md_demo\venv\Lib\site-packages\openai_base_client.py", line 1059, in _request raise self._make_status_error_from_response(err.response) from None

openai.BadRequestError: Error code: 400 - {'detail': '[address=10.27.164.119:61057, pid=36672] Model not found, uid: qwen2-vl-7b-instruct-0'}

  1. 不懂为什么输入的model是:qwen2-vl-7b-instruct,但是会去找:qwen2-vl-7b-instruct-0
  2. 但是如果将xinference中的uid改为:qwen2-vl-7b-instruct-0,又会报找不到:qwen2-vl-7b-instruct

Expected behavior / 期待表现

希望以上的错误能够得到解答,使用openai接口调用xinference时能够正常运行。

Valdanitooooo commented 5 days ago

检查模型列表 http://127.0.0.1:9997/v1/models

yuxi9264 commented 5 days ago

查看模型列表http://127.0.0.1:9997/v1/models

SyncPage[Model](data=[Model(id='qwen2-vl-7b-instruct', created=0, object='model', owned_by='xinference', model_type='LLM', address='10.27.164.119:62479', accelerators=['0'], model_name='qwen2-vl-7b-instruct', model_lang=['en', 'zh'], model_ability=['generate', 'chat', 'vision'], model_description='This is a qwen-vl-7b-instruct model', model_format='pytorch', model_size_in_billions=7, model_family='qwen2-vl-instruct', quantization='none', model_hub='huggingface', revision=None, context_length=8192, replica=1)], object='list') 10.27.164.119 是内网地址

Valdanitooooo commented 5 days ago

我还没用过 xinference 1.0.0 我用 vllm 部署的没有问题 https://github.com/Valdanitooooo/chat_with_qwen2_vl_test/blob/main/deploy/docker-compose.yml

qinxuye commented 5 days ago

是不是跑挂了?后面不需要加0,那个是副本 id。

yuxi9264 commented 4 days ago

是不是跑挂了?后面不需要加0,那个是副本 id。

有可能是的,我的显卡只是一张16G的,还是在windows上进行全参数运行。 使用openai接口 第一次调用时,会显示“远程服务器拒绝了连接请求”,第二次调用时,则会提示找不到“qwen2-vl-7b-instruct-0”模型了,在xinference页面上使用是没问题的。 在xinference页面上加载deepseek-vl-7b-chat 模型,使用openai的接口调用则无问题,有点晕了。

goodsxx commented 3 days ago

我也遇到同样的问题

yuxi9264 commented 6 hours ago

你的显卡和版本信息发上来,咱俩对一下呢发自我的 iPhone在 2024年11月29日,17:36,Song XinXin @.***> 写道: 我也遇到同样的问题

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>