roatienza / deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Apache License 2.0
287 stars 59 forks source link

pretrained-model loading with errors #29

Closed Ao-Lee closed 2 years ago

Ao-Lee commented 2 years ago

Hello, I used single GPU env with python == 3.8, torch==1.8.1 and torchvision==0.9.1 I followed the github hint with the following command:

python3 infer.py --gpu --image demo_image/demo_2.jpg --model vitstr_small_patch16_224.pth

It returned an error with

AttributeError: 'collections.OrderedDict' object has no attribute 'to'

it seems that the function model = torch.load(checkpoint) in infer.py returns an ordered dict instead of the model object. One way to solve the problem is:

ordered_dict = torch.load(checkpoint)
model.load(ordered_dict )

But I do not know the hyper params of vitstr_small_patch16_224.pth when it is training. so it is very hard form me to initialize the model object with correct hyper params. I would like to ask would it possible to may the hyper params of the pretrained models public?

I also tried the pt models

python3 infer.py --gpu --image demo_image/demo_2.jpg --model vitstr_small_patch16_jit.pt

it gives the following error:

  File "E:\ProgramFiles\anaconda3\envs\vitstr\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "e:\projects\deep-text-recognition-benchmark-master\infer.py", line 147, in <module>
    data = infer(args)

  File "e:\projects\deep-text-recognition-benchmark-master\infer.py", line 121, in infer
    model = torch.load(checkpoint)

  File "E:\ProgramFiles\anaconda3\envs\vitstr\lib\site-packages\torch\serialization.py", line 591, in load
    return torch.jit.load(opened_file)

  File "E:\ProgramFiles\anaconda3\envs\vitstr\lib\site-packages\torch\jit\_serialization.py", line 163, in load
    cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: 
Unknown type name 'NoneType':
Serialized   File "code/__torch__/modules/vitstr.py", line 12
  embed_dim : int
  num_tokens : int
  dist_token : NoneType
               ~~~~~~~~ <--- HERE
  head_dist : NoneType
  patch_embed : __torch__.timm.models.layers.patch_embed.PatchEmbed

any way to load the model correctly please? may thanks

roatienza commented 2 years ago

I just tried. Everything seems fine. Both on cpu and gpu inference.

(base) rowel@atienza-G190-G30:\~/github/roatienza/deep-text-recognition-benchmark$ python3 infer.py --image demo_image/demo_2.jpg --model https://github.com/roatienza/deep-text-recognition-benchmark/releases/download/v0.1.0/vitstr_small_patch16_jit.pt
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 82.1M/82.1M [00:01<00:00, 84.0MB/s]
/home/rowel/anaconda3/lib/python3.7/site-packages/torch/serialization.py:709: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
  " silence this warning)", UserWarning)
demo_image/demo_2.jpg   :  SHAKESHACK
(base) rowel@atienza-G190-G30:\~/github/roatienza/deep-text-recognition-benchmark$ python3 infer.py --image demo_image/demo_2.jpg --model vitstr_small_patch16_jit.pt
/home/rowel/anaconda3/lib/python3.7/site-packages/torch/serialization.py:709: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
  " silence this warning)", UserWarning)
demo_image/demo_2.jpg   :  SHAKESHACK
(base) rowel@atienza-G190-G30:\~/github/roatienza/deep-text-recognition-benchmark$ python3
Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.11.0+cu113
>>> 
(base) rowel@atienza-G190-G30:\~/github/roatienza/deep-text-recognition-benchmark$ python3 infer.py --image demo_image/demo_2.jpg --gpu --model vitstr_small_patch16_jit.pt
/home/rowel/anaconda3/lib/python3.7/site-packages/torch/serialization.py:709: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
  " silence this warning)", UserWarning)
demo_image/demo_2.jpg   :  SHAKESHACK
Ao-Lee commented 2 years ago

thx. I can update torch from 1.8 to 1.11.0 and try again

Ao-Lee commented 2 years ago

would kindly like to ask if requirements.txt can be update to your current environment setting please?

Ao-Lee commented 2 years ago

all problems are solved with pytorch 1.11.0, many thanks, plz close this issue

raisinbl commented 2 years ago

@Ao-Lee

it seems that the function model = torch.load(checkpoint) in infer.py returns an ordered dict instead of the model object. One way to solve the problem is:

ordered_dict = torch.load(checkpoint)  
model.load(ordered_dict )

How can you create model in infer.py. Can you show me how to that? I want to infer from my checkpoint but I was stuck there Thank you so much.

roatienza commented 2 years ago

In test.py, https://github.com/roatienza/deep-text-recognition-benchmark/blob/ea0d07737e334a97aa0a7df9af3118f85a2b49c2/test.py#L237

It is triggered by --infer-model option. The pytorch doc for JIT is also indicated.

raisinbl commented 2 years ago

In test.py,

https://github.com/roatienza/deep-text-recognition-benchmark/blob/ea0d07737e334a97aa0a7df9af3118f85a2b49c2/test.py#L237

It is triggered by --infer-model option. The pytorch doc for JIT is also indicated.

Thank you so much! My problem is solved

siddagra commented 2 years ago

Should probably update requirements.txt to torch=1.1.0

artyommatveev commented 1 month ago

@raisinbl, hi! Please, could you explain to me how you solved this problem? I've already tried to initialize the architecture of the tiny version in the following way: vitstr_tiny = ViTSTR(patch_size=16, embed_dim=192, depth=12, num_heads=3, mlp_ratio=4, qkv_bias=True, in_chans=1) See the original code here: https://github.com/roatienza/deep-text-recognition-benchmark/blob/fb06d18bde4e62e728208ba3274390b8a615418a/modules/vitstr.py#L156-L159

I've also organized the whole preparation process before the evaluation as it's described here: https://github.com/roatienza/deep-text-recognition-benchmark/blob/fb06d18bde4e62e728208ba3274390b8a615418a/test.py#L238-L242

In other words, my setup before the model.eval() call in the infer.py script looks as follows at this point (model is substituted for vitstr_tiny; and model = torch.load("vitstr_tiny_patch16_224.pth")):

  vitstr_tiny = ViTSTR(patch_size=16, embed_dim=192, depth=12, num_heads=3, mlp_ratio=4, qkv_bias=True, in_chans=1)
  new_state_dict = get_state_dict(model)
  vitstr_tiny.load_state_dict(new_state_dict)
  vitstr_tiny.eval() 

However, at first, I'd faced a range of errors related to the key names in the new_state_dict dictionary. Later, I fixed them by changing name = k[7:] to name = k[14:] in the function below: https://github.com/roatienza/deep-text-recognition-benchmark/blob/fb06d18bde4e62e728208ba3274390b8a615418a/test.py#L228-L234

The aim of such an act was to modify the key names of the dictionary in order to process it correctly, according to what the ViTSTR model waits for.

Anyway, I still see errors, but now they are related to size mismatching. My guess is that there's something wrong either with the .pth file for the ViTSTR-Tiny model (vitstr_tiny_patch16_224.pth) or with my setup of hyperparameters in the vitstr_tiny variable. See an example of the error:

RuntimeError: Error(s) in loading state_dict for ViTSTR:
        size mismatch for head.weight: copying a param with shape torch.Size([96, 192]) from checkpoint, the shape in current model is torch.Size([1000, 192]).
        size mismatch for head.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([1000]).

Interestingly, I've noticed that I didn't manage to run the inference neither of pretrained models that have the .pth extension (in our case, it turns out that they are representatives of the collections.OrderedDict class). But a few remaining models with the .pt extension run successfully without any modification in infer.py. Needless to say, I've set up my environment according to the requirements.txt file.

@roatienza, I hope you're doing great! I would really appreciate it if you could look into the aforementioned issues and elaborate on them a little bit as well. Another surprising thing I've seen was that there are no implementations of ViTSTR-Tiny in OCR-related frameworks such as docTR, for example. I bet the reason might be the difficulty of this model reproducibility.

artyommatveev commented 1 month ago

Okay, I just looked at the README.md file and a few other closed issues more thoroughly, and, if I got it correctly, pretrained models with the .pth extension are not suitable for the infer.py script. Put differently, one should use test.py so as to run models with such an extension; the infer.py script is only for .pt models, so to speak. I'm going to check it out a bit later and get back with some feedback right away.

artyommatveev commented 1 month ago

Yeah, everything works correctly using the prompt from here. So, the conclusion is that if you want to run the inference of a pretrained model that has the .pth file extension, you should use the test.py script in order to do that.

minhduc01168 commented 1 week ago

@artyommatveev Have you written the predict code for the .pth model yet? Can you give me a reference to it? Thank you.