[Enhancement] How to improve speed to TrOCR inference?

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

https://aka.ms/GeneralAI

MIT License

20.03k stars 2.55k forks source link

[Enhancement] How to improve speed to TrOCR inference? #462

Closed aideeptech closed 3 years ago

aideeptech commented 3 years ago

Thank you for your great work!! TrOCR working fine on single text line images. But the speed is slow on even V100 GPU, it taking 600 ms for single text line. As currently it take 384x384 size input, that may be causing speed issue.

1) Is their any option to change input size to 32x384 ( height: 32 and width: 384) without training new model? 2) Is non-squared input size supported , if we want to train/fine-tune model on 32x384 size? 3) Is their any better and easier option to improve speed?

wolfshow commented 3 years ago

@aideeptech Thanks for the interest!

Basically, we pre-trained the model with 384x384 size input, but you can fine-tune the model with any input sizes. The TrOCR model is pre-trained with document images that are mostly in squared input. We have not tried any input images in non-squared input. we plan to support non-suqared images in the future.

For other options to speed up, we also have plans to pre-train TrOCR with a smaller model size. For example, DeiT/BEiT small/tiny with BERT-tiny/MiniLM. This will lead to much smaller inference latency and easier to deploy.

aideeptech commented 3 years ago

@wolfshow Thanks... for your quick reply and guidance. I will try fine-tuning model with different input image size.