xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.96k stars 393 forks source link

MiniCPM-V-2.6 多卡部署,实际运行只使用了一张卡,从而导致显存不足 #2321

Open zt449569708 opened 1 week ago

zt449569708 commented 1 week ago

System Info / 系統信息

transformer-4.44.2,python-3.11,os-ubuntu18.04_x64

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

0.15.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

1726627264046 如图,配置使用的是2个GPU(每个GPU显存为16GB),但是运行的时候提示显存不足,通过GPU监控看只运行在1个GPU上,如图: image

Expected behavior / 期待表现

Xinfernece 支持MiniCPM-V-2.6多卡推理

948024326 commented 1 week ago

n-gpu设置为auto试一下?

zt449569708 commented 1 week ago

n-gpu设置为auto试一下?

试过了,还是只在一张GPU卡上运行

dlluckboy commented 1 week ago

n-gpu设置为auto试一下?

试过了,还是只在一张GPU卡上运行

大神们,我想问下,你们是如何升级xinference的,我是用docker部署的,多谢了啊

SDAIer commented 1 week ago

n-gpu设置为auto试一下?

试过了,还是只在一张GPU卡上运行

同样的问题

github-actions[bot] commented 1 day ago

This issue is stale because it has been open for 7 days with no activity.