lukas-blecher / LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.
https://lukas-blecher.github.io/LaTeX-OCR/
MIT License
12.61k stars 1.03k forks source link

What are the benefits of training image resizer models? #310

Open berooo opened 1 year ago

berooo commented 1 year ago

Thanks for share your code.After experiments. I find that not using image resizer model leads to poor performance. And I want to know the reason, could you please expain it?

lukas-blecher commented 1 year ago

The model was trained on images of equations with a very specific range of resolutions. When snipping a screen area it is very likely that the resulting image will have a different, usually larger, resolution than the training images. This mismatch leads to a poor performance.

In response to that, I trained a small image classification model to detect the resolution of a given image which best matches the training domain.

This helped a lot with real world images.