Xinference Chat Bot 每次对话多轮就会卡死

andylzming commented 7 months ago

Describe the bug

Xinference Chat Bot 每次对话多轮（一般两三轮）就会卡死，详见截图。

To Reproduce

To help us to reproduce this bug, please provide information below:

Python ：3.10.6
xinference : 0.9.4
Versions of crucial packages.
Full stack of the error.
Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

ChengjieLi28 commented 7 months ago

@andylzming 。我用一样的模型可以复现此问题（不一定百分百，我换个模型有时不会触发），gradio 版本：

gradio                        3.50.1
gradio_client                 0.6.1

打开F12可以看到console上有报错，然后网络中的ws中其实模型的回答已经传回来了，只是gradio没显示出来。猜测gradio版本有问题。详见：https://github.com/gradio-app/gradio/issues/6613 和 https://github.com/gradio-app/gradio/issues/3943

按照issue里面，gradio降级到3.41，我就再也没出现这样的问题，你可以试下。

andylzming commented 7 months ago

@ChengjieLi28

gradio降级到3.41，submit 按钮点击无效。

以下两个版本都会出现对话卡死现象

(xinference) [root@gpu-server gradio]# pip list | grep gradio
gradio                        3.47.1
gradio_client                 0.6.0

(xinference) [root@gpu-server depends]# ll xinference-dependences/ | grep gradio
-rw-r--r--. 1 root root  20298198 12月 19 21:55 gradio-3.50.2-py3-none-any.whl
-rw-r--r--. 1 root root    299220 12月 19 21:55 gradio_client-0.6.1-py3-none-any.whl

控制台如下： 111

112

andylzming commented 7 months ago

qwen-14b 模型对话多轮正常，chatglm3-6b 不行。另外，通过 dify 使用 xinference 与 chatglm3-6b 通信报以下错误：

错误日志

INFO 04-10 16:30:35 llm_engine.py:653] Avg prompt throughput: 24.8 tokens/s, Avg generation throughput: 6.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-10 16:30:35 async_llm_engine.py:111] Finished request 2f3a1d40-f779-11ee-b1b4-80615f20f615.
2024-04-10 16:30:35,419 xinference.api.restful_api 27390 ERROR    [address=127.0.0.1:34773, pid=24418] 0
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
data = await model.chat(prompt, system_prompt, chat_history, kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message)  # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
return await fn(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
return self._tool_calls_completion(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
content, func, args = cls._eval_chatglm3_arguments(c, tools)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
if isinstance(c[0], str):
KeyError: [address=127.0.0.1:34773, pid=24418] 0

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 5 days since being marked as stale.

xorbitsai / inference