langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
49.29k stars 7.06k forks source link

Concurrent API requests to Gemini vision model cause non-responsive behavior and block all tasks forever #9273

Open Allamss opened 1 week ago

Allamss commented 1 week ago

Self Checks

Dify version

0.9.1 / 0.9.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. Configure Gemini 1.5 Pro 001

  2. Create a new workflow using LLM with visual capabilities to briefly analyze images

  3. Upload an image via API, then make 50 concurrent API calls to execute the image analysis workflow (I m using java, and 20 concurrent calls is fine). The API calls will hang indefinitely without returning, and all LLM operations on the site will become blocked image: 31z2b5uzm9

  4. According to the workflow logs, LLM requests are successfully returned image

  5. This issue does not occur with GPT vision model

✔️ Expected Behavior

Results should be returned normally

❌ Actual Behavior

Results are not returned as expected, and the entire site becomes globally blocked. The site can only be restored by restarting

dosubot[bot] commented 1 week ago

The issue you're experiencing with the Gemini vision model might be related to how it handles concurrency. Here are a few steps you can take to troubleshoot and potentially resolve the problem:

  1. Concurrency Limits: Check if there are any concurrency limits configured for the Gemini model. If not documented, you might need to experiment with reducing the number of concurrent requests to see if it alleviates the issue.

  2. Error Handling: Ensure that your application is correctly handling any errors returned by the API. The Gemini model has specific error handling mechanisms that map various errors to unified types [1].

  3. Resource Allocation: Compare the resource allocation settings between the Gemini and GPT models. Differences in resource handling might be causing the blockage.

  4. Logs and Monitoring: Utilize Dify's logging and monitoring features to diagnose the issue. The logs can provide insights into what happens during the API calls and why they might be hanging [2].

  5. Timeouts and Retries: Implement timeouts and retries in your API requests to prevent indefinite hanging. This can help manage requests that take too long to process.

  6. Version and Configuration: Ensure that your Dify version and configuration are up to date and correctly set up for high-concurrency scenarios.

If these steps don't resolve the issue, consider isolating the problem by testing with fewer concurrent requests or different configurations to pinpoint the cause.

To continue talking to Dosu, mention @dosu.

Allamss commented 1 week ago

The issue you're experiencing with the Gemini vision model might be related to how it handles concurrency. Here are a few steps you can take to troubleshoot and potentially resolve the problem:

  1. Concurrency Limits: Check if there are any concurrency limits configured for the Gemini model. If not documented, you might need to experiment with reducing the number of concurrent requests to see if it alleviates the issue.
  2. Error Handling: Ensure that your application is correctly handling any errors returned by the API. The Gemini model has specific error handling mechanisms that map various errors to unified types [1].
  3. Resource Allocation: Compare the resource allocation settings between the Gemini and GPT models. Differences in resource handling might be causing the blockage.
  4. Logs and Monitoring: Utilize Dify's logging and monitoring features to diagnose the issue. The logs can provide insights into what happens during the API calls and why they might be hanging [2].
  5. Timeouts and Retries: Implement timeouts and retries in your API requests to prevent indefinite hanging. This can help manage requests that take too long to process.
  6. Version and Configuration: Ensure that your Dify version and configuration are up to date and correctly set up for high-concurrency scenarios.

If these steps don't resolve the issue, consider isolating the problem by testing with fewer concurrent requests or different configurations to pinpoint the cause.

To continue talking to Dosu, mention @dosu.

  1. The Gemini model has no extra configuration. Crucially, the logs show that Dify's requests to Gemini are returning (with execution time and token usage parameters), indicating that the issue is not caused by limitations.

  2. Firstly, the API requests don't return at all but remain blocked indefinitely. I've checked the logs for the web, API, and sandbox containers, and there are no error messages.

  3. As per point 1.

  4. As per point 1.

  5. After the blockage occurs, any API or site LLM requests become blocked as well.

  6. I'll try this suggestion later. @dosu

dosubot[bot] commented 1 week ago

Given the situation, here are some additional steps you might consider:

  1. Server Configuration: Ensure that your server settings are optimized for high concurrency. Adjust settings like SERVER_WORKER_AMOUNT and SQLALCHEMY_POOL_SIZE to handle more simultaneous connections [1].

  2. Gevent Worker Class: Since you're using Docker, confirm that the SERVER_WORKER_CLASS is set to gevent to efficiently manage multiple connections [1].

  3. Database Bottlenecks: Check if there are any database bottlenecks. Increasing the database connection pool size might help if the database is a limiting factor.

  4. Network and Firewall: Verify that there are no network or firewall settings that might be limiting the number of concurrent connections.

  5. Resource Monitoring: Use monitoring tools to check CPU, memory, and network usage during the concurrent requests to identify any resource exhaustion.

  6. Isolation Testing: Try running the Gemini model in isolation with a reduced number of requests to see if the issue persists, which might help identify if the problem is with the model or the infrastructure.

These steps should help you further diagnose and potentially resolve the issue.

To continue talking to Dosu, mention @dosu.

Allamss commented 1 week ago

I’ve tried the latest version 0.9.2, and the issue is still reproducible @crazywoola @dosu

hw872715125 commented 6 days ago

I also encountered this problem @crazywoola