Open 00drdelius opened 1 week ago
Hello, @00drdelius! I'm here to help you with bug reports, questions, and contributions. Let's work together to resolve the issues you're encountering.
Flask Blueprint Error: The AssertionError
occurs because the after_request
method is being set up after the blueprint service_api
has already been registered. This setup should be completed before registration. If commenting out import app
and with app.app_context():
resolves the issue, it suggests that these imports might not be necessary for your current use case. Ensure that all necessary configurations are done before the blueprint registration.
Parallel Generation Limitation: The error "Parallel generation is not supported by llama-cpp-python" indicates that using ThreadPoolExecutor
for parallel processing is inappropriate. Since CosyVoice with xinference doesn't support parallel generation, consider processing requests sequentially instead.
Text Token Limit Discrepancy: The cosyvoice.yaml
file indicates a max text token limit of 51866, while the code uses a limit of 3500. This discrepancy should be addressed by aligning the code with the configuration file to ensure consistency.
Adjust the code to handle these issues, and consider updating the _get_model_word_limit
method to reflect the correct token limit from cosyvoice.yaml
.
To continue talking to Dosu, mention @dosu.
Same question.
Self Checks
Dify version
0.9.1
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
I was deploying Cosyvoice for tts with xinference in dify. However, I encountered two unexpected errors as followed:
I just commented
from app import app
andwith app.app_context():
and the error gone. I'm not familiar with flask, but I think it's not really necessary toimport app
here?CosyVoice deployed by xinference dose not support parallel generation currenly, so it's not appropriate to use ThreadPoolExecutor here. Related issue: https://github.com/xorbitsai/inference/issues/2386
And btw, according to Line 21, cosyvoice.yaml, max text token is 51866. Hence I think it's not so precise as marked below:
api/core/model_runtime/model_providers/xinference/tts/tts.py
✔️ Expected Behavior
No response
❌ Actual Behavior
No response