Closed dudeperf3ct closed 2 years ago
Thanks for the feedback: 1) ViTSTR can only process cropped text images. It does not support text spotting (detection and recognition). ViTSTR does recognition only. 2) ViTSTR does not support multiline text. Multiline text has to be cropped into several images, one for each word line. 3) We have a follow on unpublished work using a much larger real dataset for training. We will publish this in the near future (hopefully).
Thank you for the awesome research!
I ran the code for demo images and it worked perfectly. But when I run the code on few sample images, the model seems to be incoherent.
It would be great if you answer few of my questions,
section 4.7
a scope of improvement using OpenImage v5 dataset on this research, have you tried this?Examples:
I used
vitstr_base_patch16_224_aug.pth
model for prediction.