Open Analect opened 3 days ago
!wget https://github.com/opendatalab/MinerU/raw/master/magic-pdf.template.json && mv magic-pdf.template.json ~/magic-pdf.json && sed -i 's|/tmp/models|{model_dir}|g' ~/magic-pdf.json
->
!wget https://github.com/opendatalab/MinerU/raw/master/magic-pdf.template.json && mv magic-pdf.template.json ~/magic-pdf.json && sed -i 's|/tmp/models|{model_dir}/models|g' ~/magic-pdf.json
is correct.
Of course, to simplify the process for users downloading model files and to prevent issues similar to the feedback received, we have updated the model download script. When you use the latest download_models_hf.py
, the script will automatically download magic-pdf.json
and configure the model path, eliminating the need for users to manually execute model path update code. For detailed procedures, please refer to: https://github.com/opendatalab/MinerU/blob/master/docs/README_Ubuntu_CUDA_Acceleration_en_US.md
paddlepaddle-gpu
and torch
with cu121, respectively, by using paddlepaddle-gpu with cu118 on the Linux system.
Description of the bug | 错误描述
I was trying to get this running on Colab. When running step
!magic-pdf -p demo1.pdf -o output/ -m auto
was getting[Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--opendatalab--PDF-Extract-Kit/snapshots/a29caa466f6d07be0e4863bba64204009128931a/MFD/weights.pt'
. It seems that reference is missing a subfoldermodels
that sits aboveMFD/weights.pt
.Also, T4 on Colab has following cuda verison.
For step
!pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
, what do you suggest? I notice there is nohttps://www.paddlepaddle.org.cn/packages/stable/cu122/
.Thanks.
How to reproduce the bug | 如何复现
Open colab demo from https://colab.research.google.com/gist/papayalove/b5f4913389e7ff9883c6b687de156e78/mineru_demo.ipynb.
Operating system | 操作系统
Linux
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.8.x
Device mode | 设备模式
cuda