The /v1/chat-messages interface is very slow

rechardwang commented 1 month ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Given that the issue of slow response times with the /v1/chat-messages API endpoint began after upgrading to version 0.8.2, it's likely that the upgrade introduced changes affecting performance

When using the console page for debugging, the speed is very fast.

It seems that when you use the /v1/chat-messages API endpoint, the response is slow, and this slowness is also reflected in the console logs. This indicates that the issue is likely not related to network.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 1 month ago

Hey @rechardwang! 👋 I'm here to help you with any bugs, questions, or contributions you have for the project. Let's work together to solve this issue!

I found some similar issues and discussions that might help you troubleshoot the slow response times with the /v1/chat-messages API endpoint:

Discussion: When publishing Api, ''/chat-messages'' api responds slowly with data in streaming mode (Unsolved) ^[1].
Issue: chat-messages API is taking more time compared to test run in dify (Closed) ^[2].
- Suggested Solution: Implement streaming output in the invoke_llm method to improve responsiveness. Here is the suggested code modification:
```
llm_result: LLMResult = model_instance.invoke_llm(
 prompt_messages=prompt_messages,
 model_parameters=app_orchestration_config.model_config.parameters,
 tools=[],
 stop=app_orchestration_config.model_config.stop,
 stream=True,  # Enable streaming
 user=self.user_id,
 callbacks=[],
)
```
- Additionally, consider increasing the timeout value in the frontend code and implementing a reconnect feature to handle disconnections.
Issue: Self Hosted (Docker), api call /chat-messages with response_mode: streaming, the message_end response lasts up to 5 seconds (Closed) ^[3].
- Suggested Solution: The delay could be due to server processing time, network latency, server load, the overhead of streaming, or server-side configuration settings. It's recommended to check the server-side implementation, server logs, and possibly adjust server or network configurations.

For optimizing the performance of the /v1/chat-messages API endpoint, consider the following server and network configurations for running Dify in a self-hosted Docker environment:

API Server Configuration:
- DIFY_BIND_ADDRESS: Default is 0.0.0.0.
- DIFY_PORT: Default is 5001.
- SERVER_WORKER_AMOUNT: Number of API server workers, calculated as number of CPU cores x 2 + 1.
- SERVER_WORKER_CLASS: Default is gevent.
- GUNICORN_TIMEOUT: Default is 200. Recommended value is 360 to support longer SSE (Server-Sent Events) connection times.
Database Configuration:
- DB_USERNAME, DB_PASSWORD, DB_HOST, DB_PORT (Default: 5432), DB_DATABASE.
- SQLALCHEMY_POOL_SIZE: Default is 30 connections.
- SQLALCHEMY_POOL_RECYCLE: Default is 3600 seconds.
Redis Configuration:
- REDIS_HOST, REDIS_PORT (Default: 6379), REDIS_DB (Default: 0), REDIS_PASSWORD.
- REDIS_USE_SSL: Default is false.
Celery Configuration:
- CELERY_WORKER_CLASS: Default is gevent.
- CELERY_WORKER_AMOUNT: Default is 1.
- CELERY_BROKER_URL: Redis URL for Celery broker.
File Storage Configuration:
- STORAGE_TYPE: Default is local.
- STORAGE_LOCAL_PATH: Path for local storage.
Network Configuration:
- Ensure Docker containers can communicate with each other and with the database and Redis instances.
- Configure appropriate firewall rules to allow traffic on necessary ports (e.g., 5001 for API, 5432 for PostgreSQL, 6379 for Redis).

Example Docker Compose configuration snippet:

version: '3.8'
services:
  api:
    image: dify/api:latest
    environment:
      - DIFY_BIND_ADDRESS=0.0.0.0
      - DIFY_PORT=5001
      - SERVER_WORKER_AMOUNT=5  # Adjust based on CPU cores
      - GUNICORN_TIMEOUT=360
      - DB_HOST=your_db_host
      - DB_PORT=5432
      - REDIS_HOST=your_redis_host
      - REDIS_PORT=6379
      - REDIS_PASSWORD=your_redis_password
      - STORAGE_TYPE=local
    ports:
      - "5001:5001"
    volumes:
      - ./storage:/app/api/storage
  db:
    image: postgres:latest
    environment:
      - POSTGRES_USER=your_db_user
      - POSTGRES_PASSWORD=your_db_password
      - POSTGRES_DB=your_db_name
    ports:
      - "5432:5432"
  redis:
    image: redis:latest
    environment:
      - REDIS_PASSWORD=your_redis_password
    ports:
      - "6379:6379"

This configuration ensures that the Dify API server, PostgreSQL database, and Redis instance are properly set up and can communicate with each other, providing a robust environment for optimal performance ^[4]^[5]^[6].

To continue talking to Dosu, mention @dosu.

rechardwang commented 1 month ago

所以 /v1/chat-messages 速度提不起来吗？

crazywoola commented 1 month ago

不行，LLM 那边应该生成很快，但是计算token会变慢，暂时应该没有时间修复。

njuptpz commented 5 days ago

@crazywoola 这个问题最终是修复了还是没有修复？不修复压测都过不了。压测的时候加入dify chat-messages这个api封装是直通的2倍时间。

langgenius / dify