infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
11.06k stars 1.07k forks source link

[Question]: Parsing time is so long #379

Open ChenTao98 opened 2 months ago

ChenTao98 commented 2 months ago

Describe your problem

Thanks for your work. I have deploy the ragflow system in my own server.

However, when I upload pdf file (2 pages), it costs long time to parse it (more than 300 seconds ).

log for file 1

流程开始于:
Tue, 16 Apr 2024 13:52:45 GMT
过程持续时间:
385.359
进度消息:
Page(1~2): OCR is running...
Page(1~2): OCR finished
Page(1~2): Layout analysis finished.
Page(1~2): Table analysis finished.
Page(1~2): Text merging finished
Page(1~2): Finished slicing files(3). Start to embedding the content.
Page(1~2): Finished embedding! Start to build index!
Page(1~2): Done!

log for file 2

流程开始于:
Tue, 16 Apr 2024 14:08:13 GMT
过程持续时间:
771.436
进度消息:
Page(1~2): OCR is running...
Page(1~2): OCR finished
Page(1~2): Layout analysis finished.
Page(1~2): Table analysis finished.
Page(1~2): Text merging finished
Page(1~2): Finished slicing files(3). Start to embedding the content.
Page(1~2): Finished embedding! Start to build index!
Page(1~2): Done!
wzikang commented 2 months ago

You can try calling GPU resources for parsing. According to the process of Docker deployment, GPU resources are not called by default. Here, by checking "docker/docker-compose. yml" and "docker/docker-compose-cn. yml", it can be seen that there is no configuration related to GPU during Docker container creation.

You just need to stop and delete the relevant containers that have already been started, add the following configuration in these two folders, and re execute the Docker Compose. When parsing again, you will find that the speed will be much faster after calling the GPU.

deploy: 
   resources:
      reservations:
         devices:
           - driver: nvidia
           device_ids: ['0']
           capabilities: [gpu]

Such as docker-compose. yml, complete as follows:

version: '2.2'
include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: infiniflow/ragflow:v1.0
    container_name: ragflow-server
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./entrypoint.sh:/ragflow/entrypoint.sh
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
    networks:
      - ragflow
    restart: always
ysyx2008 commented 2 months ago

yes it tooks long time. worth it.

ChenTao98 commented 2 months ago

You can try calling GPU resources for parsing. According to the process of Docker deployment, GPU resources are not called by default. Here, by checking "docker/docker-compose. yml" and "docker/docker-compose-cn. yml", it can be seen that there is no configuration related to GPU during Docker container creation.

You just need to stop and delete the relevant containers that have already been started, add the following configuration in these two folders, and re execute the Docker Compose. When parsing again, you will find that the speed will be much faster after calling the GPU.

deploy: 
   resources:
      reservations:
         devices:
           - driver: nvidia
           device_ids: ['0']
           capabilities: [gpu]

Such as docker-compose. yml, complete as follows:

version: '2.2'
include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: infiniflow/ragflow:v1.0
    container_name: ragflow-server
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./entrypoint.sh:/ragflow/entrypoint.sh
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
    networks:
      - ragflow
    restart: always

I have tried to add the configuration of gpu, however it doesn't work. Is a specific version of cuda or nvidia-driver required. The cuda version I used NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5

wzikang commented 2 months ago

You can try calling GPU resources for parsing. According to the process of Docker deployment, GPU resources are not called by default. Here, by checking "docker/docker-compose. yml" and "docker/docker-compose-cn. yml", it can be seen that there is no configuration related to GPU during Docker container creation. You just need to stop and delete the relevant containers that have already been started, add the following configuration in these two folders, and re execute the Docker Compose. When parsing again, you will find that the speed will be much faster after calling the GPU.

deploy: 
   resources:
      reservations:
         devices:
           - driver: nvidia
           device_ids: ['0']
           capabilities: [gpu]

Such as docker-compose. yml, complete as follows:

version: '2.2'
include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: infiniflow/ragflow:v1.0
    container_name: ragflow-server
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./entrypoint.sh:/ragflow/entrypoint.sh
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
    networks:
      - ragflow
    restart: always

I have tried to add the configuration of gpu, however it doesn't work. Is a specific version of cuda or nvidia-driver required. The cuda version I used NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5

This is the version I am using, verifying that GPU parsing can be called normally. Docker Compose :v2.21.0 Nvidia Driver :524.147.05 CUDA Version :12.0 You can check the Docker Compose version, as the GPU mounting configuration for different versions of Docker Compose may vary.

vbmcpy commented 1 month ago

working perfectly on driver 525.125.06 - cuda 12.0 ragflow v2.0 6.0 - build testing on 2024-05-22 16:30 GMT -3 AMERICA_SAO_PAULO_BR

alex-ca1123 commented 1 month ago

You can try calling GPU resources for parsing. According to the process of Docker deployment, GPU resources are not called by default. Here, by checking "docker/docker-compose. yml" and "docker/docker-compose-cn. yml", it can be seen that there is no configuration related to GPU during Docker container creation. You just need to stop and delete the relevant containers that have already been started, add the following configuration in these two folders, and re execute the Docker Compose. When parsing again, you will find that the speed will be much faster after calling the GPU.

deploy: 
   resources:
      reservations:
         devices:
           - driver: nvidia
           device_ids: ['0']
           capabilities: [gpu]

Such as docker-compose. yml, complete as follows:

version: '2.2'
include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: infiniflow/ragflow:v1.0
    container_name: ragflow-server
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./entrypoint.sh:/ragflow/entrypoint.sh
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
    networks:
      - ragflow
    restart: always

I have tried to add the configuration of gpu, however it doesn't work. Is a specific version of cuda or nvidia-driver required. The cuda version I used NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5

You need CUDA 12, and minimum hardware for CUDA 12 is the generation of GTX 980 I believe.