MiniCPM-V-2.6 多卡部署，实际运行只使用了一张卡，从而导致显存不足

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Apache License 2.0

4.96k stars 393 forks source link

System Info / 系統信息

transformer-4.44.2，python-3.11，os-ubuntu18.04_x64

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

0.15.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

1726627264046 如图，配置使用的是2个GPU（每个GPU显存为16GB），但是运行的时候提示显存不足，通过GPU监控看只运行在1个GPU上，如图：

xorbitsai / inference