langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
49.73k stars 7.11k forks source link

CosyVoice audio service bug with xinference #9299

Open 00drdelius opened 1 week ago

00drdelius commented 1 week ago

Self Checks

Dify version

0.9.1

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

I was deploying Cosyvoice for tts with xinference in dify. However, I encountered two unexpected errors as followed:

  1. raised from api/services/audio_service.py:
    AssertionError: The setup method 'after_request' can no longer be called on the blueprint 'service_api'.
    It has already been registered at least once, any changes will not be applied consistently.
    Make sure all imports, decorators, functions, etc. needed to set up the blueprint are done before registering it.

    image

I just commented from app import app and with app.app_context(): and the error gone. I'm not familiar with flask, but I think it's not really necessary to import app here?

  1. raised from api/core/model_runtime/model_providers/xinference/tts/tts.py
    Parallel generation is not supported by llama-cpp-python

    image

CosyVoice deployed by xinference dose not support parallel generation currenly, so it's not appropriate to use ThreadPoolExecutor here. Related issue: https://github.com/xorbitsai/inference/issues/2386

And btw, according to Line 21, cosyvoice.yaml, max text token is 51866. Hence I think it's not so precise as marked below:

api/core/model_runtime/model_providers/xinference/tts/tts.py image

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 1 week ago

Hello, @00drdelius! I'm here to help you with bug reports, questions, and contributions. Let's work together to resolve the issues you're encountering.

  1. Flask Blueprint Error: The AssertionError occurs because the after_request method is being set up after the blueprint service_api has already been registered. This setup should be completed before registration. If commenting out import app and with app.app_context(): resolves the issue, it suggests that these imports might not be necessary for your current use case. Ensure that all necessary configurations are done before the blueprint registration.

  2. Parallel Generation Limitation: The error "Parallel generation is not supported by llama-cpp-python" indicates that using ThreadPoolExecutor for parallel processing is inappropriate. Since CosyVoice with xinference doesn't support parallel generation, consider processing requests sequentially instead.

  3. Text Token Limit Discrepancy: The cosyvoice.yaml file indicates a max text token limit of 51866, while the code uses a limit of 3500. This discrepancy should be addressed by aligning the code with the configuration file to ensure consistency.

Adjust the code to handle these issues, and consider updating the _get_model_word_limit method to reflect the correct token limit from cosyvoice.yaml.

To continue talking to Dosu, mention @dosu.

kkjinping commented 6 days ago

Same question.