infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
10.08k stars 974 forks source link

[Question]: File is not a zip file. #1181

Closed water-2022 closed 1 week ago

water-2022 commented 1 week ago

Describe your problem


[INFO] [2024-06-17 03:37:34,889] [_internal._log] [line:96]: 172.22.0.6 - - [17/Jun/2024 03:37:34] "GET /v1/document/list?kb_id=009fba942c5811efb7c90242ac160006&page=1&page_size=10 HTTP/1.1" 200 - Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 146, in build cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], File "/ragflow/rag/app/naive.py", line 139, in chunk sections = [(l, "") for l in excel_parser.html(binary) if l] File "/ragflow/deepdoc/parser/excel_parser.py", line 26, in html wb = load_workbook(BytesIO(fnm)) File "/usr/local/lib/python3.10/dist-packages/openpyxl/reader/excel.py", line 344, in load_workbook reader = ExcelReader(filename, read_only, keep_vba, File "/usr/local/lib/python3.10/dist-packages/openpyxl/reader/excel.py", line 123, in init self.archive = _validate_archive(fn) File "/usr/local/lib/python3.10/dist-packages/openpyxl/reader/excel.py", line 95, in _validate_archive archive = ZipFile(filename, 'r') File "/usr/lib/python3.10/zipfile.py", line 1269, in init self._RealGetContents() File "/usr/lib/python3.10/zipfile.py", line 1336, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file

KevinHuSh commented 1 week ago

Please use EXCEL to open it and 'save as' it again before uploading. It seems our open source excel component can't read it probably because of version issure.

water-2022 commented 6 days ago

Please use EXCEL to open it and 'save as' it again before uploading. It seems our open source excel component can't read it probably because of version issure.

I have tried many times, but I still encounter the same error. I don't understand. When I open this file using WPS, there is no issue. I have tried using other XLXS files several times, but the error persists. The files also open correctly in WPS.