Open cuongdxk57 opened 3 years ago
Thanks. Normalization was not part of the CLOVA AI training/eval protocol that we used. So, we did not try normalization. We just reproduced their results and followed the same protocol on our ViRSTR for fair comparison.
thanks for your reply. Is your model able to recognize the long text? I have trained on my datasets with size of image is (32,448), however, after 300k iterations, the model accuracy is quite low. These are some images on my datasets.
You might want to train fr scratch (instead of a pre-trained ViT) if you have access to a big train dataset. In such cases, you can train without resizing the input image to unconventional size of 224x224 as done in ViTSTR. The closer the target test dataset image sizes to the train dataset image sizes, the better.
Thanks for your work. I found that you don't normalize the images before training. Is transformer better in this way? I look forward to your reply!