roatienza / deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Apache License 2.0
284 stars 57 forks source link

Training from scratch, w/o using Pretrained DeiT? #7

Open mandal4 opened 3 years ago

mandal4 commented 3 years ago

Thanks for sharing the source codes! I found that you exploited 'Pretrained weight file of DeiT' instead of training from scratch. However, i see you emphasize 'Efficiency' of your model. I wonder if there exists some issue to train from scratch.

roatienza commented 3 years ago

The pre-trained weights were used since transformers do not have inductive bias. However, for the case of STR, since MJSynth and SynText are both big in number (though lacking in diversity in terms of texture), the ViTSTR may not need pre-trained weights but this could result into lower performance. This would be an interesting future work.

mrtranducdung commented 2 years ago

Hi roatienza, Currently, there are some Transformer Models we can choose (["vitstr_tiny_patch16_224", "vitstr_small_patch16_224", "vitstr_base_patch16_224", "vitstr_tiny_distilled_patch16_224", "vitstr_small_distilled_patch16_224"]). However, those are not support non-latin lanuages. Do you have any instruction that I can train a Transformer Model that support non-latin languages? Thank you very much

roatienza commented 2 years ago

Hi, I do not have a set of instructions to train ViTSTR for non-latin chars. Training on non-latin requires: 1) Labelled train/test dataset in lmbd format 2) Change the number of characters in train and test: opt.character = string.printable[:-6] 3) Retrain ViTSTR