langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.97k stars 6.66k forks source link

api and web service stucked after upload large documet #5683

Closed SeesawLiu closed 2 months ago

SeesawLiu commented 3 months ago

Self Checks

Dify version

0.6.11

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

use docker-compose.yaml to launch dify,when upload documents with 10000 lines,api and web stucks

✔️ Expected Behavior

api should be called,web can visit

❌ Actual Behavior

api and web can't visit. configured the docker compose yaml according to #5540, but it still doesn't work. I configured unstructured, and after uploading the file, 1. First, api cpu 100%, unstructured cpu usage is low, worker_general cpu usage is low, and the front end cannot be used. 2. Then worker_general execution 100%, worker_api cpu usage is low, and the front end can be used. 3. worker_general cpu 50% worker_api cpu 100% @dosu

dosubot[bot] commented 3 months ago

To resolve the issue of the API and web service becoming inaccessible after uploading a large document in your self-hosted Dify setup, you should handle the FileTooLargeError exception. This error is raised when the file size exceeds the allowed limit.

Here is the relevant part of the code that defines the FileTooLargeError:

class FileTooLargeError(BaseHTTPException):
    error_code = 'file_too_large'
    description = "File size exceeded. {message}"
    code = 413

To resolve this issue, you can either:

  1. Increase the file size limit in your application configuration.
  2. Ensure that the uploaded file size is within the allowed limit.

If you choose to increase the file size limit, you will need to modify the configuration settings of your application to allow larger file uploads. The exact steps to do this will depend on how your application is configured and deployed.

Additionally, according to the Dify documentation, the maximum size for a single document upload is currently 15MB, and there is a limit of 100 total documents. These limits can be adjusted if you are using a local deployment. You can refer to the official website environment variable description document to configure these limits. For more details, please check the documentation.

Moreover, you can also adjust the FILES_ACCESS_TIMEOUT setting in your docker-compose.yaml file to ensure that the system has enough time to process larger files. Here is an example configuration:

# The default value is 300 seconds.
FILES_ACCESS_TIMEOUT: 300

Ensure that this setting is appropriately configured in your docker-compose.yaml file to avoid timeouts during file uploads [1][2][3][4].

To continue talking to Dosu, mention @dosu.

SeesawLiu commented 3 months ago

the question maybe not the size of document,because the document uploaded success and handled after few minitues,not throw exception @dosu

dosubot[bot] commented 3 months ago

To resolve the issue causing the API and web services to become inaccessible after uploading large documents in your self-hosted Dify setup using Docker, you need to address the size and quantity limitations for uploading knowledge documents. You can refer to the official website environment variable description document to configure the necessary settings.

Here are the steps to follow:

  1. Clone Dify Source Code:

    git clone https://github.com/langgenius/dify.git
  2. Navigate to the Docker Directory:

    cd dify/docker
  3. Customize Environment Variables: The full set of environment variables can be found under docker/dotenvs. To change their values, delete the .example suffix from the corresponding .env.example file name and edit the values in the file directly. For example:

    mv .env.example .env
    nano .env
  4. Adjust Relevant Environment Variables: Modify the environment variables related to document size and quantity limitations. Save the changes after editing. For example, you can adjust the STORAGE_TYPE and STORAGE_LOCAL_PATH to ensure that the storage configuration is optimized for handling large files:

    STORAGE_TYPE: local
    STORAGE_LOCAL_PATH: storage
  5. Update Docker Compose Configuration: Ensure that the Docker Compose configuration is set up correctly to handle the storage and network settings. For example, you can mount the storage directory and configure the necessary services:

    volumes:
     - ./volumes/app/storage:/app/api/storage
  6. Restart Dify:

    docker compose down
    docker compose up -d

By following these steps, you can configure your Dify setup to handle large document uploads without causing the API and web services to become inaccessible. Additionally, ensure that the nginx service is properly configured to handle reverse proxying for the API and web services, as shown in the docker-compose.yaml file [1][2][3].

To continue talking to Dosu, mention @dosu.

SeesawLiu commented 3 months ago

not the size and storage of the document,the point is the dify api service sucked,can i config the yaml to resolve the issue,i set the api replicas to 2,not work @dosu @crazywoola

dosubot[bot] commented 3 months ago

To address the issue of the Dify API service becoming unresponsive during large document uploads, consider the following steps:

  1. Resource Allocation: Ensure that the worker_api and worker_general services have sufficient CPU and memory resources allocated. You can adjust these settings in the docker-compose.yaml file by adding the deploy section with resource constraints:

    worker:
     image: langgenius/dify-api:0.6.11
     restart: always
     environment:
       # ... other environment variables ...
     deploy:
       resources:
         limits:
           cpus: '2.0'
           memory: '2G'
         reservations:
           cpus: '1.0'
           memory: '1G'
  2. Timeout Settings: Adjust the FILES_ACCESS_TIMEOUT setting in your docker-compose.yaml file to ensure the system has enough time to process larger files:

    environment:
     FILES_ACCESS_TIMEOUT: 600  # Increase timeout to 600 seconds
  3. Nginx Configuration: Ensure that the client_max_body_size directive in the Nginx configuration is set to accommodate large file uploads. Update the nginx.conf file:

    client_max_body_size 100M;  # Set to a value that suits your needs
  4. Logging and Monitoring: Utilize Dify's observability features to monitor and analyze application logs and performance. This can help identify specific issues causing the API service to become unresponsive. Check the logs for any errors or warnings during the upload process.

  5. Task Prioritization and Load Balancing: Ensure that tasks are properly prioritized and balanced between worker_api and worker_general services. This might involve configuring Celery queues and task routing to optimize performance.

By implementing these adjustments, you can improve the performance and responsiveness of the Dify API service during large document uploads [1][2][3].

To continue talking to Dosu, mention @dosu.

SeesawLiu commented 3 months ago

as i mention above,when web service not reponse the api service cpu 100%,the worker service cpu low.the worker resource is enough,there is no phenomenon indicating that it is a worker problem 图片 @dosu

dosubot[bot] commented 3 months ago

To address the issue of the API service CPU usage spiking to 100% and the web service becoming unresponsive when uploading a large document in your self-hosted Dify setup using Docker, you can try the following steps:

  1. Increase Resource Allocation: Ensure that your Docker virtual machine (VM) is allocated sufficient resources. The documentation suggests a minimum of 2 virtual CPUs (vCPUs) and 8 GB of initial memory. You may need to increase these values based on your workload.

  2. Optimize Docker Configuration: Adjust the Docker Compose configuration to allocate more resources to the API service. You can do this by modifying the docker-compose.yml file to set resource limits and reservations.

Here is an example of how you can modify the docker-compose.yml file to allocate more resources to the api service:

services:
  api:
    image: langgenius/dify-api:0.6.11
    restart: always
    environment:
      MODE: api
      LOG_LEVEL: INFO
      SECRET_KEY: sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U
      CONSOLE_WEB_URL: ''
      INIT_PASSWORD: ''
      CONSOLE_API_URL: ''
      SERVICE_API_URL: ''
      APP_WEB_URL: ''
      FILES_URL: ''
      FILES_ACCESS_TIMEOUT: 300
      MIGRATION_ENABLED: 'true'
      DB_USERNAME: postgres
      DB_PASSWORD: difyai123456
      DB_HOST: db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: redis
      REDIS_PORT: 6379
      REDIS_USERNAME: ''
      REDIS_PASSWORD: difyai123456
      REDIS_USE_SSL: 'false'
      REDIS_DB: 0
      CELERY_BROKER_URL: redis://:difyai123456@redis:6379/1
      WEB_API_CORS_ALLOW_ORIGINS: '*'
      CONSOLE_CORS_ALLOW_ORIGINS: '*'
      UPLOAD_FILE_SIZE_LIMIT: 50M  # Increase the file size limit as needed
      UPLOAD_FILE_BATCH_LIMIT: 10  # Adjust the batch limit if necessary
      GUNICORN_TIMEOUT: 360  # Increase the timeout to support longer processing times
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
    ports:
      - "80:80"
      - "5001:5001"
    ...
  1. Monitor and Analyze: Use Docker's monitoring tools to analyze the resource usage of your containers. This can help you identify if there are specific bottlenecks or if certain services require more resources.

  2. Optimize Application Code: Review the application code to ensure that it efficiently handles large document uploads. This may involve optimizing the code for better performance or implementing mechanisms to handle large files in chunks.

  3. Upgrade Dify: Ensure that you are using the latest version of Dify, as updates may include performance improvements and bug fixes. Follow the upgrade steps provided in the documentation:

cd dify/docker
git pull origin main
docker compose down
docker compose pull
docker compose up -d
  1. Adjust Environment Variables: Ensure that the environment variables are correctly configured to handle large uploads. For example, you can set the UPLOAD_FILE_SIZE_LIMIT to a higher value if the default limit is too low. Here is an example of relevant environment variables:
environment:
  UPLOAD_FILE_SIZE_LIMIT: 50M  # Increase the file size limit as needed
  UPLOAD_FILE_BATCH_LIMIT: 10  # Adjust the batch limit if necessary
  GUNICORN_TIMEOUT: 360  # Increase the timeout to support longer processing times

By following these steps, you should be able to mitigate the issue of high CPU usage and unresponsiveness when uploading large documents. Additionally, you can refer to the Dify documentation for more detailed information on configuring your environment variables and optimizing your setup [1][2][3][4].

To continue talking to Dosu, mention @dosu.

SeesawLiu commented 3 months ago

not work @dosu

dosubot[bot] commented 3 months ago

To address the issue of the API service's CPU usage spiking to 100% and causing the web service to become unresponsive during large document uploads in your self-hosted Dify setup using Docker, you can try the following steps:

  1. Increase CPU and Memory Allocation: Ensure that your Docker virtual machine (VM) is configured to use more virtual CPUs (vCPUs) and memory. For example, set the Docker VM to use a minimum of 2 vCPUs and 8 GB of initial memory.

  2. Optimize Docker Compose Configuration: You can adjust the resource limits for the containers in your docker-compose.yml file to allocate more CPU and memory to the API service. Here is an example of how you can set resource limits:

    services:
      api:
        image: langgenius/dify-api:0.3.2
        deploy:
          resources:
            limits:
              cpus: '2'
              memory: 4G
            reservations:
              cpus: '1'
              memory: 2G
  3. Monitor and Scale: Use Docker monitoring tools to keep an eye on the resource usage of your containers. If necessary, consider scaling the API service by running multiple instances to distribute the load.

  4. Optimize Application Code: Review the application code for any inefficiencies that could be causing high CPU usage during large document uploads. This might involve optimizing algorithms, database queries, or handling of large files.

  5. Upgrade Dify: Ensure you are using the latest version of Dify, as updates may include performance improvements and bug fixes. You can upgrade Dify by following these steps:

    cd dify/docker
    git pull origin main
    docker compose down
    docker compose pull
    docker compose up -d

By following these steps, you should be able to mitigate the issue of high CPU usage and improve the responsiveness of your web service during large document uploads [1].

To continue talking to Dosu, mention @dosu.

SeesawLiu commented 3 months ago

i found the api /console/api/datasets/indexing-estimate use all the cpu,about 3 minutes.may be the indexing is the same also @dosu

dosubot[bot] commented 3 months ago

To address the issue of the /console/api/datasets/indexing-estimate endpoint taking too long to handle requests, you can try the following steps:

  1. Increase Resource Allocation: Ensure that your Docker virtual machine (VM) is allocated sufficient resources. For macOS, set the VM to use a minimum of 2 virtual CPUs (vCPUs) and 8 GB of initial memory. For other platforms, ensure Docker has enough CPU and memory allocated.

  2. Optimize Docker Containers: Restart your Docker containers to ensure they are running optimally. You can do this by navigating to the docker directory in the Dify source code and executing the following commands:

    cd dify/docker
    docker compose down
    docker compose up -d
  3. Update Dify: Make sure you are using the latest version of Dify, as updates may include performance improvements and bug fixes. To update Dify, run the following commands:

    cd dify/docker
    git pull origin main
    docker compose down
    docker compose pull
    docker compose up -d
  4. Check Environment Variables: Review and adjust the environment variables in the docker/dotenvs directory to optimize performance settings. After making changes, restart Dify:

    docker compose down
    docker compose up -d
  5. Optimize Indexing Logic: Review the indexing logic in the DocumentIndexingEstimateApi class. Ensure that the logic for fetching and processing documents is efficient. For example, check if there are any unnecessary database queries or if the data processing can be optimized. Here is a snippet from the DocumentIndexingEstimateApi class:

    class DocumentIndexingEstimateApi(DocumentResource):
    
        @setup_required
        @login_required
        @account_initialization_required
        def get(self, dataset_id, document_id):
            dataset_id = str(dataset_id)
            document_id = str(document_id)
            document = self.get_document(dataset_id, document_id)
    
            if document.indexing_status in ['completed', 'error']:
                raise DocumentAlreadyFinishedError()
    
            data_process_rule = document.dataset_process_rule
            data_process_rule_dict = data_process_rule.to_dict()
    
            response = {
                "tokens": 0,
                "total_price": 0,
                "currency": "USD",
                "total_segments": 0,
                "preview": []
            }
    
            if document.data_source_type == 'upload_file':
                data_source_info = document.data_source_info_dict
                if data_source_info and 'upload_file_id' in data_source_info:
                    file_id = data_source_info['upload_file_id']
    
                    file = db.session.query(UploadFile).filter(
                        UploadFile.tenant_id == document.tenant_id,
                        UploadFile.id == file_id
                    ).first()
    
                    if not file:
                        raise NotFound('File not found.')
    
                    extract_setting = ExtractSetting(
                        datasource_type="upload_file",
                        upload_file=file,
                        document_model=document.doc_form
                    )
    
                    indexing_runner = IndexingRunner()
                    try:
                        response = indexing_runner.indexing_estimate(current_user.current_tenant_id, [extract_setting],
                                                                     data_process_rule_dict, document.doc_form,
                                                                     'English', dataset_id)
                    except LLMBadRequestError:
                        raise ProviderNotInitializeError(
                            "No Embedding Model available. Please configure a valid provider "
                            "in the Settings -> Model Provider.")
                    except ProviderTokenNotInitError as ex:
                        raise ProviderNotInitializeError(ex.description)
                    except Exception as e:
                        raise IndexingEstimateError(str(e))
    
            return response

By following these steps, you should be able to reduce CPU usage and improve the performance of the /console/api/datasets/indexing-estimate endpoint [1][2].

To continue talking to Dosu, mention @dosu.

gooplle commented 2 months ago

添加了设置文件大小及超时的语句 UPLOAD_FILE_SIZE_LIMIT: 50M # Increase the file size limit as needed UPLOAD_FILE_BATCH_LIMIT: 10 # Adjust the batch limit if necessary GUNICORN_TIMEOUT: 360 # Increase the timeout to support longer processing times 保存docker-compose.yaml文件,并重启docker或compose后,上传文件,系统仍然提示15M文件大小的限制。

gooplle commented 2 months ago

Added statements to set the file size and timeout UPLOAD_FILE_SIZE_LIMIT: 50M # Increase the file size limit as needed UPLOAD_FILE_BATCH_LIMIT: 10 # Adjust the batch limit if necessary GUNICORN_TIMEOUT: 360 # Increase the timeout to support longer processing times Save the docker-compose.yaml file, and after restarting docker or compose, upload the file, the system still prompts the 15M file size limit.

yujunhui commented 2 months ago

Seems the proble fixed in the latest version. I run the latest version on my PC, upload the 250K character file to the KB, and didn't get stuck.

crazywoola commented 2 months ago

@SeesawLiu Have you tried this in latest version.

crazywoola commented 2 months ago

Link #5663 and close this one