cosyvoice并发处理请求报错：Exception: Parallel generation is not supported by llama-cpp-python

leslie2046 commented 1 month ago

System Info / 系統信息

cuda :12.2 python:3.10.14 OS:centos 7.9 Package Version

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

0.15.0

The command used to start Xinference / 用以启动 xinference 的命令

nohup xinference-supervisor -H 0.0.0.0 --log-level DEBUG > supervisor.log 2>&1 & nohup xinference-worker -e "http://127.0.0.1:9997/" -H 192.168.1.88 --log-level DEBUG > worker.log 2>&1 &

Reproduction / 复现过程

024-09-11 15:14:34,206 xinference.model.audio.cosyvoice 186418 INFO CosyVoice inference_sft 2024-09-11 15:14:34,207 xinference.core.model 186418 ERROR [request 7f290426-700d-11ef-bf2a-20040ff32e74] Leave speech, error: Parallel generation is not supported by llama-cpp-python., elapsed time: 0 s Traceback (most recent call last): File "/home/njue/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, kwargs) File "/home/njue/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/xinference/core/model.py", line 711, in speech return await self._call_wrapper_binary( File "/home/njue/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/xinference/core/model.py", line 410, in _call_wrapper_binary return await self._call_wrapper("binary", fn, *args, *kwargs) File "/home/njue/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(args, kwargs) File "/home/njue/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/xinference/core/model.py", line 427, in _call_wrapper raise Exception("Parallel generation is not supported by llama-cpp-python.") Exception: Parallel generation is not supported by llama-cpp-python. 2024-09-11 15:14:34,208 xinference.core.model 186418 DEBUG After request speech, current serve request count: 0 for the model CosyVoice-300M-SFT-1-0

Expected behavior / 期待表现

能够并行处理请求

qinxuye commented 1 month ago

是调用了并发流式吗？

leslie2046 commented 1 month ago

是的

conglei1981 commented 1 month ago

Parallel generation is not supported by llama-cpp-python 同样的bug cosyvoice已经支持流式生成，但是xinference的tts还不支持流式

qinxuye commented 1 month ago

xinf 的 cosyvoice 已经支持流式。并发的流式应该会导致这个问题。不确定 cosyvoice 是不是线程安全的。

Zuckonit commented 1 month ago

xinf 的 cosyvoice 已经支持流式。并发的流式应该会导致这个问题。不确定 cosyvoice 是不是线程安全的。

cosyvoice本身应该是线程安全

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 5 days since being marked as stale.

xorbitsai / inference