langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
52.62k stars 7.7k forks source link

Can not configure timeout and retry settings of xinference #11011

Closed cyflhn closed 1 day ago

cyflhn commented 2 days ago

Self Checks

Dify version

0.10.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. create an app of any type in dify.
  2. create an model provider of xinference and add a model
  3. start a chat in the app

✔️ Expected Behavior

A normal message should be responsed.

❌ Actual Behavior

If the response time for xinference is quite long, there must be a timeout error occured in dify and retry 3 times by default. There could lead to more serious load pressure for xinference.

dosubot[bot] commented 2 days ago

I found some relevant information regarding timeout and retry settings for xinference in Dify. The default timeout is set to 10 seconds, and the retry setting is configured for a maximum of 3 retries. These settings are defined in the XinferenceHelper class within the xinference_helper.py file [1].

Currently, there isn't specific documentation or configuration options mentioned for customizing these settings directly in Dify's Docker setup or through environment variables. However, you might consider modifying the xinference_helper.py file directly to adjust these settings if necessary.

If you're experiencing issues with long response times leading to timeouts and retries, you might need to explore optimizing the performance of your xinference setup or handling load pressure more effectively. Unfortunately, there isn't specific guidance available in the documentation for these scenarios.

To continue talking to Dosu, mention @dosu.