xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.52k stars 451 forks source link

Continuous batching does not support video inputs for this model: MiniCPM-V-2.6 #2478

Open jiaolongxue opened 1 month ago

jiaolongxue commented 1 month ago

System Info / 系統信息

cuda12.1

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

v0.16.0

The command used to start Xinference / 用以启动 xinference 的命令

docker run --name xinference-local -d -e XINFERENCE_MODEL_SRC=modelscope -e MODELSCOPE_CACHE=/data/modelscope/hub -e XINFERENCE_HOME=/data/inference/home/ -e VLLM_USE_MODELSCOPE=False -v /data:/data -p 9997:9997 --gpus all xinference:v0.16.3-cuda121 xinference-local -H 0.0.0.0 --log-level debug

Reproduction / 复现过程

上传视频,询问视频内容,提示错误

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 742, in gen_wrapper
    response = next(iterator)
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 221, in predict
    response = model.chat(
  File "/usr/local/lib/python3.10/dist-packages/xinference/client/restful/restful_client.py", line 523, in chat
    raise RuntimeError(
RuntimeError: Failed to generate chat completion, detail: [address=0.0.0.0:46675, pid=77] Continuous batching does not support video inputs for this model: MiniCPM-V-2.6

Expected behavior / 期待表现

能够正确理解视频内容

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been inactive for 5 days since being marked as stale.

948024326 commented 2 weeks ago

我也报错了这个问题 请问有解决吗?

qinxuye commented 2 weeks ago

我们看下

likenamehaojie commented 1 week ago

同样的问题 ,不能对视频进行问答

likenamehaojie commented 1 week ago

@qinxuye