opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
https://opendatalab.com/OpenSourceTools?tool=extract
GNU Affero General Public License v3.0
17.96k stars 1.29k forks source link

表格布局识别不正确 #1044

Open squirrelfish opened 6 hours ago

squirrelfish commented 6 hours ago

Description of the bug | 错误描述

第4页4个表格识别成1个表格了

How to reproduce the bug | 如何复现

H3_AP202411191640951928_YB.pdf

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cpu

myhloli commented 6 hours ago

这种离得太近了,不太好识别