roatienza / deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Apache License 2.0
284 stars 57 forks source link

About input size #15

Closed terryoo closed 2 years ago

terryoo commented 2 years ago

Hi, thank you for your work. This is a very meaningful job. I am curious if the input size is the same as TRBA (32 x 100). Have you tried training with 32 x 100 input-sized images?

roatienza commented 2 years ago

In ViTSTR, the input images were resized to 224x224 since an pre-trained ViT (DeiT) model on ImageNet was used. We have a follow on work (still unpublished) that is trained from scratch on smaller images (eg 32x100) from ST and/or MJ datasets. The performance is higher.

terryoo commented 2 years ago

Thank you for a fast reply :) I hope your future work also will be published. I will close the issue, because, the issue is resolved.