Closed CocoaML closed 1 day ago
使用官网: 地址:https://opendatalab.com/OpenSourceTools/Extractor
报错截图: ![Uploading MinerU异常-1.png…]()
报错内容: err: "task_id": "ed69264-29c6-4280-ae1c-3cc334771fd", "state": -1, "data": ("result": ''3, "msg": "infer task led69264-29c6-4280-ae1c-3cc334771fd failed. Err: Infer failed.File analysis failed. Analyse failed. code=8: Failed to decode JPX image"}
你这个pdf的第四页有问题,浏览器打开是空白的,可能是一张损坏了的图片,建议通过裁剪的方式删除第四页再解析
你这个pdf的第四页有问题,浏览器打开是空白的,可能是一张损坏了的图片,建议通过裁剪的方式删除第四页再解析
感谢您反馈。
如果PDF中有问题的页,后续可考虑升级:跳过不能解析的页面,继续完成PDF转md。我们有办法实现吗?期待您反馈。
因为我们使用的解析库是第三方库,遇到报错的跳过逻辑您可以到 https://github.com/pymupdf/PyMuPDF 反馈。 我们目前只能通过try catch这个异常,跳过当前这一本的解析
Description of the bug | 错误描述
【可复现】报错:pymupdf.mupdf.FzErrorSyntax: code=8: Failed to decode JPX image part_4.pdf
使用官网运行此文件依然报错: 官网地址:https://opendatalab.com/OpenSourceTools/Extractor
官网运行报错截图: ![Uploading MinerU异常-1.png…]()
How to reproduce the bug | 如何复现
报错日志:
Traceback (most recent call last): File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/runpy.py", line 198, in _run_module_as_main return _run_code(code, main_globals, None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/runpy.py", line 88, in _run_code exec(code, run_globals) File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 71, in
cli.main()
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
run()
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
runpy.run_path(target, run_name="main")
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
_run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
exec(code, run_globals)
File "/usr/mineru_test/mineru_project/mineru_project_python/minerU/magic_pdf_parse_main.py", line 151, in
pdf_parse_main(pdf_path, output_dir="./out")
File "/usr/mineru_test/mineru_project/mineru_project_python/minerU/magic_pdf_parse_main.py", line 108, in pdf_parse_main
pipe.pipe_classify()
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/pipe/UNIPipe.py", line 25, in pipe_classify
self.pdf_type = AbsPipe.classify(self.pdf_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/pipe/AbsPipe.py", line 63, in classify
pdf_meta = pdf_meta_scan(pdf_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 331, in pdf_meta_scan
image_info_per_page, junk_img_bojids = get_image_info(doc, page_width_pts, page_height_pts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 119, in get_image_info
page_result = process_image(page, junk_img_bojids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 39, in process_image
recs = page.get_image_rects(img, transform=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/utils.py", line 879, in get_image_rects
pix = pymupdf.Pixmap(page.parent, xref) # make pixmap of the image to compute MD5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/init.py", line 10110, in init
img = mupdf.pdf_load_image(pdf, ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/mupdf.py", line 50901, in pdf_load_image
return _mupdf.pdf_load_image(doc, obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pymupdf.mupdf.FzErrorSyntax: code=8: Failed to decode JPX image
Operating system | 操作系统
Linux
Python version | Python 版本
3.11
Software version | 软件版本 (magic-pdf --version)
0.7.x
Device mode | 设备模式
cuda