opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction
https://pdf-extract-kit.readthedocs.io/zh-cn/latest/index.html
GNU Affero General Public License v3.0
5.68k stars 382 forks source link

【QA】mineru 0.9.3版本页眉和页码混入 #176

Open dt-yy opened 10 hours ago

dt-yy commented 10 hours ago

Description of the bug | 错误描述

image

How to reproduce the bug | 如何复现

mineru 0.9.3 doclyaout-yolo模型 docstructbench_dianzishu_zhongwenzaixian-o.O-61566094.pdf_55.jpg

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

myhloli commented 8 hours ago

模型识别问题请提交到kit仓库