Closed aideeptech closed 3 years ago
@aideeptech Thanks for the interest!
Basically, we pre-trained the model with 384x384 size input, but you can fine-tune the model with any input sizes. The TrOCR model is pre-trained with document images that are mostly in squared input. We have not tried any input images in non-squared input. we plan to support non-suqared images in the future.
For other options to speed up, we also have plans to pre-train TrOCR with a smaller model size. For example, DeiT/BEiT small/tiny with BERT-tiny/MiniLM. This will lead to much smaller inference latency and easier to deploy.
@wolfshow Thanks... for your quick reply and guidance. I will try fine-tuning model with different input image size.
@aideeptech have you figured out how to train with different input image sizes?
Is there any update on fast inference?
Hello, any update on this?
@aideeptech Have you fine tuned the model using different image size? How is the performance?
Hi All, Does anyone aware of the lightweight model of TR-OCR?
did you get any lead on this ?
Thank you for your great work!! TrOCR working fine on single text line images. But the speed is slow on even V100 GPU, it taking 600 ms for single text line. As currently it take 384x384 size input, that may be causing speed issue.
1) Is their any option to change input size to 32x384 ( height: 32 and width: 384) without training new model? 2) Is non-squared input size supported , if we want to train/fine-tune model on 32x384 size? 3) Is their any better and easier option to improve speed?