xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.55k stars 457 forks source link

请问跑minicpm-llama3-v-2_5(int4)支持并发调用接口么?2个及以上并发调用就报错了 #1672

Open geminizyz opened 5 months ago

geminizyz commented 5 months ago

请问跑minicpm-llama3-v-2_5(int4)支持并发调用接口么?2个及以上并发调用就报错了,单个没有问题。。。。 微信截图_20240619180246 微信截图_20240619180359

需要怎么样才能并发呢?目前是一台物理机 24G 显卡,虽然资源不多,但希望能够实现起码2个并发吧~~

qinxuye commented 5 months ago

这个看起来是模型量化导致的问题。

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 7 days with no activity.

bao21987 commented 3 months ago

minicpm-2b-sft-bf16推理时遇到了类似的问题,单线程请求正常,并发请求失败,且推理服务不再可用