infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
24.23k stars 2.36k forks source link

When the files in the knowledge base are linked, the parsing file will report an error. #3099

Open liuzhiming3 opened 1 month ago

liuzhiming3 commented 1 month ago

Is there an existing issue for the same bug?

Branch name

主要

Commit ID

1

Other environment information

MacBook Pro
Mac OS 14.7 (23H124)

Actual behavior

When the files in the knowledge base are linked, the parsing file will report an erro.

Expected behavior

When the files in the knowledge base are linked, the parsing file will report an erro.

Steps to reproduce

When the files in the knowledge base are linked, the parsing file will report an erro.

Additional information

Traceback (most recent call last): 2024-10-30 10:18:14 File "/ragflow/rag/svr/task_executor.py", line 174, in build 2024-10-30 10:18:14 cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/rag/app/naive.py", line 204, in chunk 2024-10-30 10:18:14 sections, tbls = Docx()(filename, binary) 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/rag/app/naive.py", line 63, in call 2024-10-30 10:18:14 self.doc = Document( 2024-10-30 10:18:14 ^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/.venv/lib/python3.12/site-packages/docx/api.py", line 27, in Document 2024-10-30 10:18:14 document_part = cast("DocumentPart", Package.open(docx).main_document_part) 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/.venv/lib/python3.12/site-packages/docx/opc/package.py", line 127, in open 2024-10-30 10:18:14 pkg_reader = PackageReader.from_file(pkg_file) 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/.venv/lib/python3.12/site-packages/docx/opc/pkgreader.py", line 22, in from_file 2024-10-30 10:18:14 phys_reader = PhysPkgReader(pkg_file) 2024-10-30 10:18:14 ^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-30 10:18:14 File "/ragflow/.venv/lib/python3.12/site-packages/docx/opc/phys_pkg.py", line 21, in new 2024-10-30 10:18:14 raise PackageNotFoundError("Package not found at '%s'" % pkg_file) 2024-10-30 10:18:14 docx.opc.exceptions.PackageNotFoundError: Package not found at '预期成效v0.0.1.docx'

KevinHuSh commented 1 month ago

Please upload it again. And check the status of minio.