Closed terryoo closed 2 years ago
In ViTSTR, the input images were resized to 224x224 since an pre-trained ViT (DeiT) model on ImageNet was used. We have a follow on work (still unpublished) that is trained from scratch on smaller images (eg 32x100) from ST and/or MJ datasets. The performance is higher.
Thank you for a fast reply :) I hope your future work also will be published. I will close the issue, because, the issue is resolved.
Hi, thank you for your work. This is a very meaningful job. I am curious if the input size is the same as TRBA (32 x 100). Have you tried training with 32 x 100 input-sized images?