Can not configure timeout and retry settings of xinference

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Other

52.62k stars 7.7k forks source link

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.10.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

create an app of any type in dify.
create an model provider of xinference and add a model
start a chat in the app

✔️ Expected Behavior

A normal message should be responsed.

❌ Actual Behavior

If the response time for xinference is quite long, there must be a timeout error occured in dify and retry 3 times by default. There could lead to more serious load pressure for xinference.

I found some relevant information regarding timeout and retry settings for xinference in Dify. The default timeout is set to 10 seconds, and the retry setting is configured for a maximum of 3 retries. These settings are defined in the XinferenceHelper class within the xinference_helper.py file ^[1].

Currently, there isn't specific documentation or configuration options mentioned for customizing these settings directly in Dify's Docker setup or through environment variables. However, you might consider modifying the xinference_helper.py file directly to adjust these settings if necessary.

If you're experiencing issues with long response times leading to timeouts and retries, you might need to explore optimizing the performance of your xinference setup or handling load pressure more effectively. Unfortunately, there isn't specific guidance available in the documentation for these scenarios.

To continue talking to Dosu, mention @dosu.

langgenius / dify