Open audio-github-2020 opened 2 weeks ago
报错内容:
2024-08-29 16:20:52.704 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 11, cid_chars_radio: 0.0 2024-08-29 16:20:52.705 | WARNING | magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: True, by_text: False, by_avg_words: False, by_img_num: True, by_text_layout: True, by_img_narrow_strips: True, by_invalid_chars: True [1] 53755 segmentation fault magic-pdf -p /Users/t/Downloads/attach/test.pdf -o -m auto
test.pdf
(MinerU) ✘ t@tMAC ~ magic-pdf -p /Users/t/Downloads/attach/test.pdf -o /Users/t/Downloads/attach -m auto 2024-08-29 16:20:52.704 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 11, cid_chars_radio: 0.0 2024-08-29 16:20:52.705 | WARNING | magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: True, by_text: False, by_avg_words: False, by_img_num: True, by_text_layout: True, by_img_narrow_strips: True, by_invalid_chars: True [1] 53755 segmentation fault magic-pdf -p /Users/t/Downloads/attach/test.pdf -o -m auto (MinerU) ✘ t@tMAC ~
magic-pdf.json如下: { "models-dir": "/Users/t/.cache/modelscope/hub/opendatalab/PDF-Extract-Kit/models", "device-mode":"cpu", "table-config": { "is_table_recog_enable": false, "max_time": 400 } }
MacOS
3.10
0.7.x
cpu
Description of the bug | 错误描述
报错内容:
2024-08-29 16:20:52.704 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 11, cid_chars_radio: 0.0 2024-08-29 16:20:52.705 | WARNING | magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: True, by_text: False, by_avg_words: False, by_img_num: True, by_text_layout: True, by_img_narrow_strips: True, by_invalid_chars: True [1] 53755 segmentation fault magic-pdf -p /Users/t/Downloads/attach/test.pdf -o -m auto
test.pdf
How to reproduce the bug | 如何复现
(MinerU) ✘ t@tMAC ~ magic-pdf -p /Users/t/Downloads/attach/test.pdf -o /Users/t/Downloads/attach -m auto 2024-08-29 16:20:52.704 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 11, cid_chars_radio: 0.0 2024-08-29 16:20:52.705 | WARNING | magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: True, by_text: False, by_avg_words: False, by_img_num: True, by_text_layout: True, by_img_narrow_strips: True, by_invalid_chars: True [1] 53755 segmentation fault magic-pdf -p /Users/t/Downloads/attach/test.pdf -o -m auto (MinerU) ✘ t@tMAC ~
magic-pdf.json如下: { "models-dir": "/Users/t/.cache/modelscope/hub/opendatalab/PDF-Extract-Kit/models", "device-mode":"cpu", "table-config": { "is_table_recog_enable": false, "max_time": 400 } }
Operating system | 操作系统
MacOS
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.7.x
Device mode | 设备模式
cpu