[TrOCR] Image aspect ratio

riteshKumarUMass commented 2 years ago

Hi, I have following 3 questions and would be really grateful if anyone could provide some insights:

While pertaining the model on the text lines extracted from the PDFs and synthetic data, do you maintain the aspect ratio of the image while resizing it to 384x384 size? Using the HuggingFace's TROCR preprocessor, I noticed that it does not maintain the aspect ratio and therefore, would like to understand if this would affect model's performance.
Did "textline" contain multiple words in a single image or did you split the image further at word level before feeding it to the model?
Did you try training the model at word level instead of line level and notice any difference?

riteshKumarUMass commented 2 years ago

Could someone respond to this?

henryle97 commented 2 years ago

Hi, I have following 3 questions and would be really grateful if anyone could provide some insights:

While pertaining the model on the text lines extracted from the PDFs and synthetic data, do you maintain the aspect ratio of the image while resizing it to 384x384 size? Using the HuggingFace's TROCR preprocessor, I noticed that it does not maintain the aspect ratio and therefore, would like to understand if this would affect model's performance.

Did "textline" contain multiple words in a single image or did you split the image further at word level before feeding it to the model?

Did you try training the model at word level instead of line level and notice any difference?

they use 384x384 setting for both printed (word-level) and handwriting (line-level). I think they use square image to fit with DeiT model.
Textline contain multiple words in a single image

microsoft / unilm

[TrOCR] Image aspect ratio #867