issues
search
opendatalab
/
MinerU
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
https://opendatalab.com/OpenSourceTools
GNU Affero General Public License v3.0
13.43k
stars
1.01k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
能否选择不对文档中的图片进行截图另存
#767
ElsaReedz
closed
3 days ago
2
docs:Update the driver requirements on the Ubuntu system.
#766
myhloli
closed
1 week ago
0
refactor(para): improve paragraph splitting algorithm
#765
myhloli
closed
1 week ago
0
丢失部分内容
#764
guoguo0646
closed
3 days ago
1
请问大佬们如何识别小语种的文档,例如泰语、阿拉伯语等,是从哪个地方下载语言包吗
#763
040211
closed
3 days ago
1
使用StructEqTable解析表格很慢
#762
Schumpeterx
opened
1 week ago
2
Will support DocLayout-YOLO_ft? Like PDF-Extract Kit
#761
Schumpeterx
closed
3 days ago
2
前端代码是否开源
#760
CapLee
closed
1 week ago
1
使用体验
#759
liepinlxy
opened
1 week ago
12
magic-pdf, version 0.8.1 pdf 解析报错
#758
wertyac
closed
3 days ago
5
解析pdf中的html链接,会出现多余空格
#757
xiaotaozi121096
closed
1 week ago
14
行间公式编号识别
#756
yiyibooks
closed
1 week ago
1
公式检测是否需要
#755
520jefferson
closed
6 days ago
20
detectron2 installation failure
#754
nathanlll8
closed
1 week ago
1
refactor(ocr):Increase the dilation factor in OCR to address the issue of word concatenation.
#753
myhloli
closed
1 week ago
0
pdf解析command疑问
#752
eileenwsd
closed
1 week ago
2
segmenter支持
#751
novohool
opened
1 week ago
0
table 解析结果
#750
xxlxms
closed
2 weeks ago
1
magic-pdf version 0.6.1
#749
17Reset
closed
6 days ago
9
昨天运行正常,今天报错
#748
userandpass
closed
6 days ago
9
update example files
#747
myhloli
closed
2 weeks ago
0
报错
#746
liepinlxy
opened
2 weeks ago
4
Fork/sync 20241015
#745
jpcanesin
closed
2 weeks ago
1
fix(para_split_v3): refine list block detection in paragraph splitting
#744
myhloli
closed
2 weeks ago
0
refactor(para_split_v3): merge list and index block detection
#743
myhloli
closed
2 weeks ago
0
OCR方案替换
#742
chinaphilip
closed
3 days ago
6
2个T4卡,如何设置不同的模型加载在不同的卡上,不是直接使用litserve的方案
#741
dxjjhm
closed
3 days ago
4
feat(list&index block): detect and merge list and index blocks
#740
myhloli
closed
2 weeks ago
0
公式识别结果会出一长串
#738
meng0423
opened
2 weeks ago
2
feat: manager docs with sphinx
#737
icecraft
closed
2 weeks ago
0
一直报错:非法指令 (核心已转储)
#736
1134018901
closed
7 hours ago
1
模型下载好了,config也配置好了,运行解析pdf,报错
#735
fuxuelinwudi
closed
3 days ago
13
python3.10, version 0.8.1
#734
fuxuelinwudi
closed
2 weeks ago
1
安装 mineru,版本号0.6.1
#733
fuxuelinwudi
closed
2 weeks ago
1
Colab mineru_demo.ipynb failing; bad `/MFD/weights.pt` file reference
#732
Analect
opened
2 weeks ago
1
Is it possible to convert a PDF into a PDF/UA Format?
#731
Samyssmile
closed
7 hours ago
3
zsh: no matches found: magic-pdf[full]
#729
immmor
closed
2 weeks ago
1
公式右侧的序号没有识别
#728
1134018901
closed
2 weeks ago
1
上传扫描版pdf进行解析时,报错了,RuntimeError:could not execute a primitive....
#727
Fanxhion
closed
2 weeks ago
5
帮我看下这个bug是咋回事啊
#726
chinaphilip
closed
2 weeks ago
2
模型文件更新了吗,为什么加载UniMerNet模型权重参数报错了
#725
chinaphilip
closed
2 weeks ago
2
开启表格识别后巨慢,半个小时都处理不了一个4.5M 的pdf 文档
#724
tao-xiaoxin
closed
2 weeks ago
32
Wrong detection and missing some details in the detection.
#722
Ab-AI-1
closed
3 days ago
2
一段中文字符串中的英文单词空格被去掉了,这个容易修复吗
#721
1134018901
opened
2 weeks ago
2
目录解析不换行问题
#720
singeleaf
opened
2 weeks ago
0
扫描件pdf中的表格识别效果较差,后续能否针对性改进?
#719
Fanxhion
opened
2 weeks ago
2
fix: Solving the Grouping Anomaly Issue with Multiple Consecutive Non-Text Blocks
#718
myhloli
closed
2 weeks ago
0
Update how_to_download_models_zh_cn.md
#717
myhloli
closed
2 weeks ago
0
feat(pdf_parse_union_core_v2): reintegrate para_split_v3 and add page range support
#716
myhloli
closed
2 weeks ago
0
web_demo性能问题
#715
bojiw
opened
2 weeks ago
0
Previous
Next