xinference并发处理问题

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

https://inference.readthedocs.io

Apache License 2.0

5.12k stars 413 forks source link

xinference并发处理问题 #1502

Closed ccly1996 closed 2 months ago

ccly1996 commented 5 months ago

Describe the bug

我用fastGPT接入了xinference部署的vllm qwen32b，测试并发的时候会遇到跑4个并发的时候xinference后台报错，然后ui里也看不到跑的模型了，显卡还在100%占用

To Reproduce

To help us to reproduce this bug, please provide information below:

Your Python version.3.10
The version of xinference you use.0.11.0
Versions of crucial packages.
Full stack of the error.
Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here. vllm 0.4.1 2d91f7e8605f6672144f6fda2d51023a

8c95b152839255140bda960d1d5e2117

qinxuye commented 5 months ago

这个最近发现多起这个现象，实际上是 vllm 引擎已经死掉导致的。

我们计划当判断 vllm 引擎死掉的时候自杀重启 vllm 引擎。

WangxuP commented 5 months ago

这个最近发现多起这个现象，实际上是 vllm 引擎已经死掉导致的。

我们计划当判断 vllm 引擎死掉的时候自杀重启 vllm 引擎。

这个与GPU有关系吗？我之前有个RTX A8000的卡，部署vLLM之后，调用一会性能就开始下降，

zhanghx0905 commented 5 months ago

这是vllm引擎的问题，我之前也多次遇到，在启动vllm模型时控制以下参数可以解决 --max-num-batched-tokens Maximum number of batched tokens per iteration.

--max-num-seqs Maximum number of sequences per iteration.

ccly1996 commented 5 months ago

感谢，这个参数一般是多少

---- 回复的原邮件 ---- | 发件人 | Hexiao @.> | | 发送日期 | 2024年05月17日 16:23 | | 收件人 | xorbitsai/inference @.> | | 抄送人 | ccly1996 @.>, Author @.> | | 主题 | Re: [xorbitsai/inference] xinference并发处理问题 (Issue #1502) |

这是vllm引擎的问题，我之前也多次遇到，在启动vllm模型时控制以下参数可以解决 --max-num-batched-tokens Maximum number of batched tokens per iteration.

--max-num-seqs Maximum number of sequences per iteration.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

zhanghx0905 commented 5 months ago

先设置的小一点，然后慢慢往大调，可以用jmeter之类的工具压测一下

ccly1996 commented 5 months ago

我现在4卡vllm部署qwen32b int4，jmeter测只有四个并发

---- 回复的原邮件 ---- | 发件人 | Hexiao @.> | | 发送日期 | 2024年05月17日 16:35 | | 收件人 | xorbitsai/inference @.> | | 抄送人 | ccly1996 @.>, Author @.> | | 主题 | Re: [xorbitsai/inference] xinference并发处理问题 (Issue #1502) |

先设置的小一点，然后慢慢往大调，可以用jmeter之类的工具压测一下

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 5 days since being marked as stale.