microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.03k stars 2.55k forks source link

[Enhancement] How to improve speed to TrOCR inference? #462

Closed aideeptech closed 3 years ago

aideeptech commented 3 years ago

Thank you for your great work!! TrOCR working fine on single text line images. But the speed is slow on even V100 GPU, it taking 600 ms for single text line. As currently it take 384x384 size input, that may be causing speed issue.

1) Is their any option to change input size to 32x384 ( height: 32 and width: 384) without training new model? 2) Is non-squared input size supported , if we want to train/fine-tune model on 32x384 size? 3) Is their any better and easier option to improve speed?

wolfshow commented 3 years ago

@aideeptech Thanks for the interest!

Basically, we pre-trained the model with 384x384 size input, but you can fine-tune the model with any input sizes. The TrOCR model is pre-trained with document images that are mostly in squared input. We have not tried any input images in non-squared input. we plan to support non-suqared images in the future.

For other options to speed up, we also have plans to pre-train TrOCR with a smaller model size. For example, DeiT/BEiT small/tiny with BERT-tiny/MiniLM. This will lead to much smaller inference latency and easier to deploy.

aideeptech commented 3 years ago

@wolfshow Thanks... for your quick reply and guidance. I will try fine-tuning model with different input image size.

Akshaysharma29 commented 3 years ago

@aideeptech have you figured out how to train with different input image sizes?

abdullahakmal commented 2 years ago

Is there any update on fast inference?

Anas-Alshaghouri commented 2 years ago

Hello, any update on this?

rohitpharande commented 1 year ago

@aideeptech Have you fine tuned the model using different image size? How is the performance?

Ajithbalakrishnan commented 1 year ago

Hi All, Does anyone aware of the lightweight model of TR-OCR?

abhinav-TB commented 1 year ago

did you get any lead on this ?