netease-youdao / QAnything

Question and Answer based on Anything.
https://qanything.ai
GNU Affero General Public License v3.0
11.88k stars 1.15k forks source link

[BUG] 上传docx文件处理超时 #514

Open liangpn opened 2 months ago

liangpn commented 2 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

知识库上传文件,上传docx文件处理超时,大小只有500KB

期望行为 | Expected Behavior

希望可以快速的处理文档

运行环境 | Environment

- OS:Ubuntu 22.04
- NVIDIA Driver:555.42.02
- CUDA:12.4
- Docker Compose:26.0.0
- NVIDIA GPU Memory:48G

QAnything日志 | QAnything logs

2024-09-11 02:41:30,114 - [PID: 982][Sanic-Server-0-0] - [Function: split_file_to_docs] - INFO - start split file to docs, file_path: test.docx 2024-09-11 02:46:30,343 - [PID: 982][Sanic-Server-0-0] - [Function: process_data] - ERROR - Timeout: split_file_to_docs took longer than 300 seconds****

复现方法 | Steps To Reproduce

1.知识库,上传文档

备注 | Anything else?

源码是在 qanything_kernel/core/retriever/general_document.py 的 split_file_to_docs 方法;我测试的是docx类型文档,其他类型的还没试过,估计也是有这个问题。