Open zhq734 opened 3 months ago
上传一个由扫描上来的纯图片型的pdf文件,会一直处于解析中,后端报数组下标越界,具体原因可能是无法解析出文档,不知道是什么原因导致无法解析出文档内容,造成数组下标越界
能正常解析文档
- OS: CentOS Linux release 7.9.2009 - NVIDIA Driver: - CUDA: - docker: - docker-compose: - NVIDIA GPU: - NVIDIA GPU Memory:
2024-07-29 22:44:00,578 insert_files_to_faiss: KB292157495c50455ba10b30c66e9c25d4 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1056.23it/s] docs number: 0 2024-07-29 22:44:00,590 before 2nd split doc lens: 0 2024-07-29 22:44:00,590 after 2nd split doc lens: 0 2024-07-29 22:44:00,590 langchain analysis docs is empty! 2024-07-29 22:44:00,591 函数 split_file_to_docs 执行耗时: 0.012501716613769531 秒 2024-07-29 22:44:00,595 split time: 0.012667655944824219 0 ERROR:asyncio:Task exception was never retrieved future: <Task finished name='Task-6' coro=<LocalDocQA.insert_files_to_faiss() done, defined at /opt/soft/QAnything/qanything_kernel/core/local_doc_qa.py:81> exception=IndexError('list index out of range')> Traceback (most recent call last): File "/opt/soft/QAnything/qanything_kernel/core/local_doc_qa.py", line 104, in insert_files_to_faiss add_ids = await self.faiss_client.add_document(local_file.docs) File "/opt/soft/QAnything/qanything_kernel/connector/database/faiss/faiss_client.py", line 113, in add_document kb_id = docs[0].metadata['kb_id'] IndexError: list index out of range
No response
+1 有些图片png上传知识库,也会出现这个问题
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
上传一个由扫描上来的纯图片型的pdf文件,会一直处于解析中,后端报数组下标越界,具体原因可能是无法解析出文档,不知道是什么原因导致无法解析出文档内容,造成数组下标越界
期望行为 | Expected Behavior
能正常解析文档
运行环境 | Environment
QAnything日志 | QAnything logs
2024-07-29 22:44:00,578 insert_files_to_faiss: KB292157495c50455ba10b30c66e9c25d4 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1056.23it/s] docs number: 0 2024-07-29 22:44:00,590 before 2nd split doc lens: 0 2024-07-29 22:44:00,590 after 2nd split doc lens: 0 2024-07-29 22:44:00,590 langchain analysis docs is empty! 2024-07-29 22:44:00,591 函数 split_file_to_docs 执行耗时: 0.012501716613769531 秒 2024-07-29 22:44:00,595 split time: 0.012667655944824219 0 ERROR:asyncio:Task exception was never retrieved future: <Task finished name='Task-6' coro=<LocalDocQA.insert_files_to_faiss() done, defined at /opt/soft/QAnything/qanything_kernel/core/local_doc_qa.py:81> exception=IndexError('list index out of range')> Traceback (most recent call last): File "/opt/soft/QAnything/qanything_kernel/core/local_doc_qa.py", line 104, in insert_files_to_faiss add_ids = await self.faiss_client.add_document(local_file.docs) File "/opt/soft/QAnything/qanything_kernel/connector/database/faiss/faiss_client.py", line 113, in add_document kb_id = docs[0].metadata['kb_id'] IndexError: list index out of range
复现方法 | Steps To Reproduce
上传一个由扫描上来的纯图片型的pdf文件,会一直处于解析中,后端报数组下标越界,具体原因可能是无法解析出文档,不知道是什么原因导致无法解析出文档内容,造成数组下标越界
备注 | Anything else?
No response