CosyVoice audio service bug with xinference

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Other

49.73k stars 7.11k forks source link

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.9.1

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

I was deploying Cosyvoice for tts with xinference in dify. However, I encountered two unexpected errors as followed:

raised from api/services/audio_service.py:

AssertionError: The setup method 'after_request' can no longer be called on the blueprint 'service_api'.
It has already been registered at least once, any changes will not be applied consistently.
Make sure all imports, decorators, functions, etc. needed to set up the blueprint are done before registering it.

I just commented from app import app and with app.app_context(): and the error gone. I'm not familiar with flask, but I think it's not really necessary to import app here?

raised from api/core/model_runtime/model_providers/xinference/tts/tts.py
```
Parallel generation is not supported by llama-cpp-python
```

CosyVoice deployed by xinference dose not support parallel generation currenly, so it's not appropriate to use ThreadPoolExecutor here. Related issue: https://github.com/xorbitsai/inference/issues/2386

And btw, according to Line 21, cosyvoice.yaml, max text token is 51866. Hence I think it's not so precise as marked below:

api/core/model_runtime/model_providers/xinference/tts/tts.py

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

Hello, @00drdelius! I'm here to help you with bug reports, questions, and contributions. Let's work together to resolve the issues you're encountering.

Flask Blueprint Error: The AssertionError occurs because the after_request method is being set up after the blueprint service_api has already been registered. This setup should be completed before registration. If commenting out import app and with app.app_context(): resolves the issue, it suggests that these imports might not be necessary for your current use case. Ensure that all necessary configurations are done before the blueprint registration.
Parallel Generation Limitation: The error "Parallel generation is not supported by llama-cpp-python" indicates that using ThreadPoolExecutor for parallel processing is inappropriate. Since CosyVoice with xinference doesn't support parallel generation, consider processing requests sequentially instead.
Text Token Limit Discrepancy: The cosyvoice.yaml file indicates a max text token limit of 51866, while the code uses a limit of 3500. This discrepancy should be addressed by aligning the code with the configuration file to ensure consistency.

Adjust the code to handle these issues, and consider updating the _get_model_word_limit method to reflect the correct token limit from cosyvoice.yaml.

To continue talking to Dosu, mention @dosu.

langgenius / dify