RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
24.23k
stars
2.36k
forks
source link
When the files in the knowledge base are linked, the parsing file will report an error. #3099
Open
liuzhiming3 opened 1 month ago
Is there an existing issue for the same bug?
Branch name
主要
Commit ID
1
Other environment information
Actual behavior
When the files in the knowledge base are linked, the parsing file will report an erro.
Expected behavior
When the files in the knowledge base are linked, the parsing file will report an erro.
Steps to reproduce
Additional information
Traceback (most recent call last): 2024-10-30 10:18:14 File "/ragflow/rag/svr/task_executor.py", line 174, in build 2024-10-30 10:18:14 cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/rag/app/naive.py", line 204, in chunk 2024-10-30 10:18:14 sections, tbls = Docx()(filename, binary) 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/rag/app/naive.py", line 63, in call 2024-10-30 10:18:14 self.doc = Document( 2024-10-30 10:18:14 ^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/.venv/lib/python3.12/site-packages/docx/api.py", line 27, in Document 2024-10-30 10:18:14 document_part = cast("DocumentPart", Package.open(docx).main_document_part) 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/.venv/lib/python3.12/site-packages/docx/opc/package.py", line 127, in open 2024-10-30 10:18:14 pkg_reader = PackageReader.from_file(pkg_file) 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/.venv/lib/python3.12/site-packages/docx/opc/pkgreader.py", line 22, in from_file 2024-10-30 10:18:14 phys_reader = PhysPkgReader(pkg_file) 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/.venv/lib/python3.12/site-packages/docx/opc/phys_pkg.py", line 21, in new 2024-10-30 10:18:14 raise PackageNotFoundError("Package not found at '%s'" % pkg_file) 2024-10-30 10:18:14 docx.opc.exceptions.PackageNotFoundError: Package not found at '预期成效v0.0.1.docx'