opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
https://opendatalab.com/OpenSourceTools?tool=extract
GNU Affero General Public License v3.0
19.89k stars 1.41k forks source link

KeyError: <class 'transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfig'> #250

Closed shenyang2223 closed 4 months ago

shenyang2223 commented 4 months ago

Description of the bug | 错误描述

1、here is an error here, how should it be handled

KeyError: <class 'transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfig'>

123123

2、This is my installation package Package Version


absl-py 2.1.0 aiohttp 3.9.5 aiosignal 1.3.1 albucore 0.0.12 albumentations 1.4.12 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 astor 0.8.1 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.2.0 Babel 2.15.0 bce-python-sdk 0.9.17 beautifulsoup4 4.12.3 black 24.4.2 blinker 1.8.2 boto3 1.34.150 botocore 1.34.150 braceexpand 0.1.7 Brotli 1.1.0 cachetools 5.4.0 certifi 2024.7.4 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 colorlog 6.8.2 contourpy 1.2.1 cryptography 43.0.0 cssselect 1.2.0 cssutils 2.11.1 cycler 0.12.1 Cython 3.0.10 datasets 2.20.0 decorator 5.1.1 detectron2 0.6 dill 0.3.8 et-xmlfile 1.1.0 eva-decord 0.6.1 eval_type_backport 0.2.0 evaluate 0.4.2 exceptiongroup 1.2.2 fairscale 0.4.13 fast-langdetect 0.2.1 fasttext-wheel 0.9.2 filelock 3.15.4 fire 0.6.0 Flask 3.0.3 flask-babel 4.0.0 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.5.0 ftfy 6.2.0 future 1.0.0 fvcore 0.1.5.post20221221 grpcio 1.65.1 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.24.3 hydra-core 1.3.2 idna 3.7 imageio 2.34.2 imgaug 0.4.0 iopath 0.1.9 itsdangerous 2.2.0 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 kiwisolver 1.4.5 lazy_loader 0.4 lmdb 1.5.1 loguru 0.7.2 lxml 5.2.2 magic-pdf 0.6.1 Markdown 3.6 MarkupSafe 2.1.5 matplotlib 3.9.1 more-itertools 10.3.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 networkx 3.3 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 omegaconf 2.3.0 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.10.0.84 openpyxl 3.1.5 opt-einsum 3.3.0 packaging 24.1 paddleocr 2.7.3 paddlepaddle 2.6.1 pandas 2.2.2 pathspec 0.12.1 pdf2docx 0.5.8 pdfminer.six 20240706 pillow 10.4.0 pip 24.0 platformdirs 4.2.2 portalocker 2.10.1 premailer 3.10.0 protobuf 4.25.4 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pybind11 2.13.1 pyclipper 1.3.0.post5 pycocotools 2.0.8 pycparser 2.22 pycryptodome 3.20.0 pydantic 2.8.2 pydantic_core 2.20.1 PyMuPDF 1.24.9 PyMuPDFb 1.24.9 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-docx 1.1.2 pytz 2024.1 PyYAML 6.0.1 rapidfuzz 3.9.5 rarfile 4.2 regex 2024.7.24 requests 2.32.3 robust-downloader 0.0.2 s3transfer 0.10.2 safetensors 0.4.3 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 setuptools 69.5.1 shapely 2.0.5 six 1.16.0 sniffio 1.3.1 soupsieve 2.5 sympy 1.13.1 tabulate 0.9.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 termcolor 2.4.0 threadpoolctl 3.5.0 tifffile 2024.7.24 timm 0.9.16 tokenizers 0.19.1 tomli 2.0.1 torch 2.3.1 torchtext 0.18.0 torchvision 0.18.1 tqdm 4.66.4 transformers 4.40.0 triton 2.3.1 typing_extensions 4.12.2 tzdata 2024.1 ultralytics 8.2.69 ultralytics-thop 2.0.0 unimernet 0.1.6 urllib3 2.2.2 visualdl 2.5.3 Wand 0.6.13 wcwidth 0.2.13 webdataset 0.2.86 Werkzeug 3.0.3 wheel 0.43.0 wordninja 2.0.0 xxhash 3.4.1 yacs 0.1.8 yarl 1.9.4

How to reproduce the bug | 如何复现

magic-pdf pdf-command --pdf "test.pdf" --inside_model true --model_mode full

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cpu

myhloli commented 4 months ago

please check model file size & sha256 is same with online web page. if not, you need redownload them.

shenyang2223 commented 4 months ago

yes,you are right