roatienza / deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Apache License 2.0
284 stars 57 forks source link

Training on Japanese data #10

Closed Preethse closed 2 years ago

Preethse commented 2 years ago

Can you please tell us regarding the changes one should make to train the network for Japanese or any other language.

roatienza commented 2 years ago

The most important is the dataset. MJSynth and SynthText were used in the paper for training. Both are labelled synthetic images. For testing, real-world text images are needed. Test dataset can be labelled manually or semi-automatically. Once the datasets are available, the number of classes in ViTSTR should be changed to reflect the number of characters in the target language. Then, train and validate the model end to end.

Preethse commented 2 years ago

Thanks, I was able to start my training.

nawafalageel commented 2 years ago

Can you please elaborate more on how did you train on a custom dataset (different language other than English) Thanks

roatienza commented 2 years ago

on top of my head: 1) Change the characters: https://github.com/roatienza/deep-text-recognition-benchmark/blob/ea0d07737e334a97aa0a7df9af3118f85a2b49c2/train.py#L278

2) Change the number of character categories of the head: https://github.com/roatienza/deep-text-recognition-benchmark/blob/ea0d07737e334a97aa0a7df9af3118f85a2b49c2/modules/vitstr.py#L59