issues
search
opendatalab
/
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
https://opendatalab.com/OpenSourceTools?tool=extract
GNU Affero General Public License v3.0
17.94k
stars
1.29k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix: use concrete class instead of abstract class
#1052
icecraft
closed
1 hour ago
0
请问下,mineru支持对word文档的解析吗?
#1051
asenasen123
opened
1 hour ago
0
refactor(txt_parse): improve text extraction accuracy with new algorithm
#1050
myhloli
closed
1 hour ago
2
feat(ocr): improve text detection and OCR accuracy
#1049
myhloli
closed
1 hour ago
0
fix(remove_overlaps_min_spans): optimize overlap detection in OCR span list modification
#1048
myhloli
closed
1 hour ago
0
fix(ocr_mkcontent): improve hyphen handling at line ends
#1047
myhloli
closed
1 hour ago
0
refactor(ocr_dict_merge): add threshold parameter for line merging
#1046
myhloli
closed
1 hour ago
0
fix(tools): handle empty language string in common.py
#1045
myhloli
closed
2 hours ago
0
表格布局识别不正确
#1044
squirrelfish
opened
3 hours ago
1
【模型加载求助】
#1043
yingliu0518
closed
57 minutes ago
4
rapidocr_paddle
#1042
lyc728
opened
5 hours ago
2
能不能做到标题和正文在一行时对标题的识别
#1041
ZzYAmbition
opened
6 hours ago
0
请问目前对于并发度的支持是怎么样呢?如果需要多并发度怎么操作?
#1039
Muyi030
closed
56 minutes ago
1
There are reading order problems in this published version
#1038
zahrarsl
closed
56 minutes ago
8
AttributeError: 'tuple' object has no attribute 'shape'
#1037
xuhongtian
opened
1 day ago
6
fix: remove test code
#1036
icecraft
closed
6 hours ago
0
批量测试
#1035
lyc728
closed
1 day ago
3
【可复现】报错:pymupdf.mupdf.FzErrorSyntax: code=8: Failed to decode JPX image
#1034
CocoaML
closed
1 day ago
4
图并没有截图到文件中
#1033
lyc728
closed
1 day ago
0
Request for Bengali Language Support in OCR
#1032
raselmeya94
opened
1 day ago
0
一张图片里有简体中文、英文、韩文、繁体中文、日文等多种语言 如何进行OCR识别
#1031
huyidu
opened
1 day ago
1
是否能支持batch批跑呢
#1030
charliedream1
opened
1 day ago
0
提取错误
#1029
YANGtzeRi
opened
2 days ago
3
使用RapidTable识别表格且已开启table-config中的识别表格功能,结果是图片而不是html
#1028
mrslimslim
closed
1 day ago
14
refactor: move some constants or enums defs to config folder
#1027
icecraft
closed
2 days ago
0
请问NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6可以使用GPU加速吗
#1026
Muyi030
closed
2 days ago
2
Including link: https://aquasecurity.github.io/
#1025
Davidjennison1
closed
2 days ago
0
delete unused pipeline file
#1024
liugongjian
closed
2 days ago
2
新版本0.93报错 发现是公式解析模型的时候
#1023
3300752199
closed
2 days ago
1
请帮我看看我的这个问题,我在使用原本0.8.1版本的时候可以跑的pdf文件,在换用了新的框架之后出了问题
#1022
farierer
closed
2 days ago
2
新手想问问怎么启动源码?目的是想将识别为figure的强制ocr提取文本信息
#1021
aodingpeng
closed
2 days ago
9
layout识别错位
#1020
FHhui
opened
2 days ago
3
使用magic-pdf命令,报错OpenBLAS线程限制
#1019
Muyi030
opened
2 days ago
1
refactor(para): adjust right margin threshold based on block width
#1018
myhloli
closed
2 days ago
0
ppocr DEBUG 请问这是错误吗?
#1017
sanwacompany
closed
2 days ago
2
build(setup): add old_linux specific dependencies
#1016
myhloli
closed
2 days ago
0
ERROR: detectron2-0.6-cp310-cp310-macosx_10_9_universal2.whl is not a supported wheel on this platform.
#1015
CyberAsteroid
closed
2 days ago
2
【QA】mineru公式后处理问题
#1014
dt-yy
closed
57 minutes ago
1
refactor(para): improve paragraph splitting logic
#1013
myhloli
closed
2 days ago
0
add DocLayout-YOLO url
#1012
qiangqiang199
closed
2 days ago
1
add Doclayout-yolo url
#1011
qiangqiang199
closed
2 days ago
1
feat(ocr): improve handling of angled text boxes
#1010
myhloli
closed
3 days ago
0
标题识别和代码识别需求
#1009
Tian14267
closed
2 days ago
6
FastAPI的PDF解析接口,解析完的md文件和图片在哪里可以看到
#1008
asenasen123
opened
3 days ago
0
页眉页脚解析问题
#1007
zhongxin129
opened
3 days ago
0
fix: using new data api replace old rw api
#1006
icecraft
closed
2 days ago
0
fastapi部署时,返回结果出错
#1005
asenasen123
opened
3 days ago
0
由于新版本albumentations依赖simsimd导致不支持Centos7的说明
#1004
myhloli
opened
3 days ago
0
内网无法访问huggingface
#1002
yq-warehouse
closed
3 days ago
24
refactor(tests): extract common test utilities into test_commons.py
#1001
myhloli
closed
3 days ago
0
Next