langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
52.39k stars 7.64k forks source link

知识库documents post服务阻塞其余api访问 #5540

Closed 401557122 closed 5 months ago

401557122 commented 5 months ago

Self Checks

Dify version

0.6.8

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

您好,我是使用docker-compose.middleware.yaml部署的相关中间件,并且使用源码运行flask run --host 0.0.0.0 --port=5001以及celery -A app.celery worker -P gevent -c 1 -Q dataset,generation,mail --loglevel INFO,并使用web的镜像直接搭建前端。 在使用中我发现,当我点击页面的知识库中的保存并处理按钮,也就是documents接口或者datasets/init接口,因为文档多或文档内容多的时候,会持续处理,此时,其余的智能体、页面相关接口,调用均是卡死的状态无法返回内容。 请问这应该如何解决,我起初以为是celery worker设置的1,我改为128仍出现该问题

✔️ Expected Behavior

希望文档处理任务在运行时互不影响,且不影响其余服务调用

❌ Actual Behavior

No response

dosubot[bot] commented 5 months ago

Hello, @401557122! I'm here to help you with any bugs, questions, or contributions while you wait for a human maintainer. Let's tackle this issue together!

To ensure that document processing tasks run without affecting other service calls in Dify version 0.6.8 when using Celery and Docker, you can configure separate Celery workers for different types of tasks. This can be achieved by defining different queues and assigning specific workers to those queues.

Here is an example of how you can modify your docker-compose.yaml to achieve this:

version: '3'
services:
  # API service
  api:
    image: langgenius/dify-api:0.6.11
    restart: always
    environment:
      MODE: api
      LOG_LEVEL: INFO
      SECRET_KEY: sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U
      CONSOLE_WEB_URL: ''
      INIT_PASSWORD: ''
      CONSOLE_API_URL: ''
      SERVICE_API_URL: ''
      APP_WEB_URL: ''
      FILES_URL: ''
      FILES_ACCESS_TIMEOUT: 300
      MIGRATION_ENABLED: 'true'
      DB_USERNAME: postgres
      DB_PASSWORD: difyai123456
      DB_HOST: db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: redis
      REDIS_PORT: 6379
      REDIS_USERNAME: ''
      REDIS_PASSWORD: difyai123456
      REDIS_USE_SSL: 'false'
      REDIS_DB: 0
      CELERY_BROKER_URL: redis://:difyai123456@redis:6379/1
      WEB_API_CORS_ALLOW_ORIGINS: '*'
      CONSOLE_CORS_ALLOW_ORIGINS: '*'
      STORAGE_TYPE: local
      STORAGE_LOCAL_PATH: storage
      S3_USE_AWS_MANAGED_IAM: 'false'
      S3_ENDPOINT: 'https://xxx.r2.cloudflarestorage.com'
      S3_BUCKET_NAME: 'difyai'
      S3_ACCESS_KEY: 'ak-difyai'
      S3_SECRET_KEY: 'sk-difyai'
      S3_REGION: 'us-east-1'
      AZURE_BLOB_ACCOUNT_NAME: 'difyai'
      AZURE_BLOB_ACCOUNT_KEY: 'difyai'
      AZURE_BLOB_CONTAINER_NAME: 'difyai-container'
      AZURE_BLOB_ACCOUNT_URL: 'https://<your_account_name>.blob.core.windows.net'
      GOOGLE_STORAGE_BUCKET_NAME: 'yout-bucket-name'
      GOOGLE_STORAGE_SERVICE_ACCOUNT_JSON_BASE64: 'your-google-service-account-json-base64-string'
      ALIYUN_OSS_BUCKET_NAME: 'your-bucket-name'
      ALIYUN_OSS_ACCESS_KEY: 'your-access-key'
      ALIYUN_OSS_SECRET_KEY: 'your-secret-key'
      ALIYUN_OSS_ENDPOINT: 'https://oss-ap-southeast-1-internal.aliyuncs.com'
      ALIYUN_OSS_REGION: 'ap-southeast-1'
      ALIYUN_OSS_AUTH_VERSION: 'v4'
      TENCENT_COS_BUCKET_NAME: 'your-bucket-name'
      TENCENT_COS_SECRET_KEY: 'your-secret-key'

  # Celery worker for general tasks
  worker_general:
    image: langgenius/dify-api:0.6.11
    restart: always
    environment:
      MODE: worker
      LOG_LEVEL: INFO
      SECRET_KEY: sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U
      DB_USERNAME: postgres
      DB_PASSWORD: difyai123456
      DB_HOST: db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: redis
      REDIS_PORT: 6379
      REDIS_USERNAME: ''
      REDIS_PASSWORD: difyai123456
      REDIS_DB: 0
      CELERY_BROKER_URL: redis://:difyai123456@redis:6379/1
      CELERY_QUEUE: general

  # Celery worker for document processing tasks
  worker_document:
    image: langgenius/dify-api:0.6.11
    restart: always
    environment:
      MODE: worker
      LOG_LEVEL: INFO
      SECRET_KEY: sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U
      DB_USERNAME: postgres
      DB_PASSWORD: difyai123456
      DB_HOST: db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: redis
      REDIS_PORT: 6379
      REDIS_USERNAME: ''
      REDIS_PASSWORD: difyai123456
      REDIS_DB: 0
      CELERY_BROKER_URL: redis://:difyai123456@redis:6379/1
      CELERY_QUEUE: document

  # Other services (db, redis, etc.) remain unchanged
  db:
    image: postgres:15-alpine
    restart: always
    environment:
      PGUSER: postgres
      POSTGRES_PASSWORD: difyai123456
      POSTGRES_DB: dify
      PGDATA: /var/lib/postgresql/data/pgdata
    volumes:
      - ./volumes/db/data:/var/lib/postgresql/data
    healthcheck:
      test: [ "CMD", "pg_isready" ]
      interval: 1s
      timeout: 3s
      retries: 30

  redis:
    image: redis:6-alpine
    restart: always
    volumes:
      - ./volumes/redis/data:/data
    command: redis-server --requirepass difyai123456
    healthcheck:
      test: [ "CMD", "redis-cli", "ping" ]

In this configuration, two separate Celery workers are defined: worker_general for general tasks and worker_document for document processing tasks. Each worker is assigned to a different queue (general and document respectively) using the CELERY_QUEUE environment variable. This separation ensures that document processing tasks do not interfere with other service calls [1].

To continue talking to Dosu, mention @dosu.

401557122 commented 5 months ago

你太帅了机器哥

dosubot[bot] commented 5 months ago

谢谢夸奖!如果你有其他问题或需要进一步的帮助,请随时告诉我。祝你使用顺利!

To continue talking to Dosu, mention @dosu.

SeesawLiu commented 4 months ago

I configured it according to the above method, but it still doesn't work. I configured unstructured, and after uploading the file, 1. First, api cpu 100%, unstructured cpu usage is low, worker_general cpu usage is low, and the front end cannot be used. 2. Then worker_general execution 100%, worker_api cpu usage is low, and the front end can be used. 3. worker_general cpu 50% worker_api cpu 100%

@dosu @dosubot