opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
https://opendatalab.com/OpenSourceTools?tool=extract
GNU Affero General Public License v3.0
17.96k stars 1.29k forks source link

【模型加载求助】 #1043

Closed yingliu0518 closed 5 hours ago

yingliu0518 commented 8 hours ago

因为公司网络无法使用脚本从huggingface或ModelScope下载模型权重。

于是我手动下载了模型权重。请问这一行的模型是什么? "layoutreader-model-dir":"/tmp/layoutreader"

{ "bucket_info":{ "bucket-name-1":["ak", "sk", "endpoint"], "bucket-name-2":["ak", "sk", "endpoint"] }, "models-dir":"/tmp/models", "layoutreader-model-dir":"/tmp/layoutreader", "device-mode":"cpu", "layout-config": { "model": "layoutlmv3" }, "formula-config": { "mfd_model": "yolo_v8_mfd", "mfr_model": "unimernet_small", "enable": true }, "table-config": { "model": "rapid_table", "enable": false, "max_time": 400 }, "config_version": "1.0.0" } 我的报错信息如下: 2024-11-21 14:53:43.483 | ERROR | magic_pdf.user_api:parse_pdf:97 - C:\Users**.cache/modelscope/hub/ppaanngggg/layoutreader does not appear to have a file named config.json. Checkout 'https://huggingface.co/C:\Users\l**.cache/modelscope/hub/ppaanngggg/layoutreader/tree/main' for available files.

myhloli commented 8 hours ago

可以找个有网的电脑跑下模型下载的脚本,脚本跑完后会在log中输出模型缓存目录,直接拷贝模型目录和自动生成的配置文件到你的内网设备的用户目录即可

yingliu0518 commented 7 hours ago

[11/21 16:24:33 fvcore.common.checkpoint]: [Checkpointer] Loading from d:\code\MinerU-master\MinerU-master\model\Layout/LayoutLMv3/model_final.pth ... download https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar to C:\Users*****/.paddleocr/whl\det\ch\ch_PP-OCRv4_det_infer\ch_PP-OCRv4_det_infer.tar 2024-11-21 16:25:06.063 | ERROR | main:pdf_parse_main:140 - HTTPSConnectionPool(host='paddleocr.bj.bcebos.com', port=443): Max retries exceeded with url: /PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))

为什么这里还会下载这个ch_PP-OCRv4_det_infer呀?我记得在models-dir这个文件夹中有这个模型。

yingliu0518 commented 7 hours ago

[11/21 16:24:33 fvcore.common.checkpoint]: [Checkpointer] Loading from d:\code\MinerU-master\MinerU-master\model\Layout/LayoutLMv3/model_final.pth ... download https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar to C:\Users*****/.paddleocr/whl\det\ch\ch_PP-OCRv4_det_infer\ch_PP-OCRv4_det_infer.tar 2024-11-21 16:25:06.063 | ERROR | main:pdf_parse_main:140 - HTTPSConnectionPool(host='paddleocr.bj.bcebos.com', port=443): Max retries exceeded with url: /PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))

为什么这里还会下载这个ch_PP-OCRv4_det_infer呀?我记得在models-dir这个文件夹中有这个模型。

我手动下载ch_PP-OCRv4_det_infer.tar,然后移动到C:\Users*****/.paddleocr/whl\det\ch\ch_PP-OCRv4_det_infer\ch_PP-OCRv4_det_infer.tar也解决不了问题

myhloli commented 7 hours ago

[11/21 16:24:33 fvcore.common.checkpoint]: [Checkpointer] Loading from d:\code\MinerU-master\MinerU-master\model\Layout/LayoutLMv3/model_final.pth ... download https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar to C:\Users*****/.paddleocr/whl\det\ch\ch_PP-OCRv4_det_infer\ch_PP-OCRv4_det_infer.tar 2024-11-21 16:25:06.063 | ERROR | main:pdf_parse_main:140 - HTTPSConnectionPool(host='paddleocr.bj.bcebos.com', port=443): Max retries exceeded with url: /PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))

为什么这里还会下载这个ch_PP-OCRv4_det_infer呀?我记得在models-dir这个文件夹中有这个模型。

模型仓库里的是kit开发同事使用的,mineru使用的是原版paddle模型加载逻辑 tar包需要解压成 https://huggingface.co/spaces/opendatalab/MinerU/tree/main/paddleocr 的样式,并将整个paddleocr目录复制到用户目录下的.paddleocr目录