使用xinference的api服务调用，当过多请求的时候，xinference本地api会直接卡死

zhaozhizhuo commented 1 month ago

System Info / 系統信息

cuda 12.4，transformers框架

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

xinference=0.13.0

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local -H 0.0.0.0
xinference launch --model-name qwen0.5b-langchain --model-format pytorch --model-engine Transformers --gpu-idx 0，1，2

Reproduction / 复现过程

1.挂载xinference 2.挂载qwen2-14b模型 3.使用api接口进行模型的调用 4.将api接口使用flask-api进行封装，并一直输入问题 5.即可卡死，无法关闭当前运行的程序在xinference-local -H 0.0.0.0 会出现链接失败

Expected behavior / 期待表现

询问是否是xinference的问题，因为我换成本地部署以后并没有发生这种错误，两者仅仅多了一个调用xinference所带来的api的步骤。如果是应该如何解决。

zhaozhizhuo commented 1 month ago

2024-07-18 09:57:28,011 xinference.api.restful_api 196054 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=48320) during chat. 2024-07-18 09:57:28,023 xinference.api.restful_api 196054 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=48336) during chat. 2024-07-18 09:57:28,035 xinference.api.restful_api 196054 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=56848) during chat. 2024-07-18 09:57:28,043 xinference.api.restful_api 196054 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=41498) during chat. 2024-07-18 09:57:28,052 xinference.api.restful_api 196054 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=53198) during chat. 2024-07-18 09:57:28,058 xinference.api.restful_api 196054 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=60506) during chat. 2024-07-18 09:57:28,066 xinference.api.restful_api 196054 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=40334) during chat. 2024-07-18 09:57:28,072 xinference.api.restful_api 196054 ERROR Chat completion stream got an error: invalid state Traceback (most recent call last): File "/copydata2/zhaozhizhuo/anaconda3/envs/langchain-qwen-inference/lib/python3.11/site-packages/xinference/api/restful_api.py", line 1537, in stream_results async for item in iterator: File "/copydata2/zhaozhizhuo/anaconda3/envs/langchain-qwen-inference/lib/python3.11/site-packages/xoscar/api.py", line 340, in anext return await self._actor_ref.__xoscar_next__(self._uid) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/copydata2/zhaozhizhuo/anaconda3/envs/langchain-qwen-inference/lib/python3.11/site-packages/xoscar/backends/context.py", line 226, in send result = await self._wait(future, actor_ref.address, send_message) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/copydata2/zhaozhizhuo/anaconda3/envs/langchain-qwen-inference/lib/python3.11/site-packages/xoscar/backends/context.py", line 115, in _wait return await future ^^^^^^^^^^^^ File "/copydata2/zhaozhizhuo/anaconda3/envs/langchain-qwen-inference/lib/python3.11/site-packages/xoscar/backends/core.py", line 88, in _listen future.set_result(message) asyncio.exceptions.InvalidStateError: invalid state

qinxuye commented 1 month ago

Transformers 引擎可能承载不了很高的并发，试下 vllm 引擎。

zhaozhizhuo commented 1 month ago

好的，我换一下试试，多谢啦

zhaozhizhuo commented 1 month ago

xinference launch --model-name qwen0.5b-langchain --model-format pytorch --model-engine Transformers --gpu-idx 0，1，2 这个命令就可以使用transformers进行加载模型。但是将transfromers换成vllm就会显示失败launch是什么原因呀。

qinxuye commented 1 month ago

有报错栈吗？

zhaozhizhuo commented 1 month ago

多谢啦，我好像解决了这个问题，我使用了不同环境来加载和部署xinference，所以他可能找不到通信，然后就没有办法吧模型加载到xinference里边。

zhaozhizhuo commented 1 month ago

你好，我在使用XINFERENCE_TRANSFORMERS_ENABLE_BATCHING=1进行批处理的时候，无法使用launch进行加载模型。我是用了vllm的命令进行加载。具体的加载命令是：xinference launch --model-engine vLLM --model-name qwen2-7b-instruct --size-in-billions 7 --model-format pytorch --quantization none --gpu-idx 2，3，4，6会在每张卡上加载480m左右的现存，然后就跟卡住了一样，没有任何反应。持续了三四个小时也没有挂载成功。

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

xorbitsai / inference