opendatalab / MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
https://opendatalab.com/OpenSourceTools
GNU Affero General Public License v3.0
11.23k stars 839 forks source link

Can't load pretrained model #249

Open Holmes2002 opened 1 month ago

Holmes2002 commented 1 month ago

Description of the bug | 错误描述

Bug about loading pretrained model I can't load pretrained-model although I had to assign path containing config.json and pytorch_model.bin Error

Traceback (most recent call last):
  File "/home/vudinh/anaconda3/envs/MinerU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 575, in load_state_dict
    return torch.load(
  File "/home/vudinh/anaconda3/envs/MinerU/lib/python3.9/site-packages/torch/serialization.py", line 788, in load
    raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported operand 71

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vudinh/anaconda3/envs/MinerU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 584, in load_state_dict
    if f.read(7) == "version":
  File "/home/vudinh/anaconda3/envs/MinerU/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home1/data/congvu/multilingual-ocr/MinerU/test.py", line 8, in <module>
    model = VisionEncoderDecoderModel.from_pretrained(model_name, config=config)
  File "/home/vudinh/anaconda3/envs/MinerU/lib/python3.9/site-packages/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py", line 371, in from_pretrained
    return super().from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
  File "/home/vudinh/anaconda3/envs/MinerU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3716, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "/home/vudinh/anaconda3/envs/MinerU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 596, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/home1/data/congvu/multilingual-ocr/MinerU/models/MFR/UniMERNet/pytorch_model.bin' at '/home1/data/congvu/multilingual-ocr/MinerU/models/MFR/UniMERNet/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Operating system | 操作系统

Linux

Python version | Python 版本

3.9

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cuda

myhloli commented 1 month ago

check model file's sha256 ,if not same with online,you need to redownload it.

Holmes2002 commented 1 month ago

@myhloli What kind of method to download ? I downloaded by "git lfs " but same errors with HuggingFace model

myhloli commented 1 month ago

@myhloli What kind of method to download ? I downloaded by "git lfs " but same errors with HuggingFace model

Maybe use 'wget' download the few largest files from modelscope will help you.

https://www.modelscope.cn/wanderkid/PDF-Extract-Kit

wanglf1979 commented 1 month ago

@myhloli What kind of method to download ? I downloaded by "git lfs " but same errors with HuggingFace model

Maybe use 'wget' download the few largest files from modelscope will help you.

https://www.modelscope.cn/wanderkid/PDF-Extract-Kit

我从这儿用git下载了一遍,也是不行。

python必须3.10么??(我现在是3.12.3,是不是太新了?)

Holmes2002 commented 1 month ago

@wanglf1979 I encountered same error in python 3.9.

Katherinaxxx commented 1 month ago

same error. I have tried redownload from huggingface and modelscope.

wanglf1979 commented 1 month ago

@wanglf1979 I encountered same error in python 3.9.

MinuerU‘s develop was based on python 3.10(refer the main page's doc).so you'd better update ur python's version to 3.10(I used 3.12.3 ,it still doesn't work)

wanglf1979 commented 1 month ago

same error. I have tried redownload from huggingface and modelscope.

does it work now ?(I redownlowed from modelscope, but it doesn't work.....)

Katherinaxxx commented 1 month ago

same error. I have tried redownload from huggingface and modelscope.

does it work now ?(I redownlowed from modelscope, but it doesn't work.....)

doesn't work