microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.19k stars 2.55k forks source link

What kind of detector would you recommend for trocr? #1074

Open wendlerc opened 1 year ago

wendlerc commented 1 year ago

The TrOCR models do recognition. Thus, in order to apply it to arbitrary images, one needs a boundingbox detector.

Which one would you recommend?

I am currently using the one from paddleocr.

Cheers, Chris

Mohammed20201991 commented 1 year ago

+1

bit-scientist commented 1 year ago

Hi, @wendlerc and @Mohammed20201991. Any recommendations for a text detector (for handwritten texts on images) to combine with TrOCR? Your insights would help immensely.

wendlerc commented 1 year ago

In this repo here: https://github.com/LAION-AI/OCR-ensemble we mainly used the one from paddleocr. We also started looking into https://github.com/open-mmlab/mmocr that seems to have a 'complementary' text detector. Complementary in the sense that it has quite different strenghts/weaknesses compared to the paddleocr detector.

Mohammed20201991 commented 1 year ago

Hi @bit-scientist as @wendlerc mentioned in addition to integrate with other approaches like PyLia & transkribus this might help .

bit-scientist commented 1 year ago

In this repo here: https://github.com/LAION-AI/OCR-ensemble we mainly used the one from paddleocr. We also started looking into https://github.com/open-mmlab/mmocr that seems to have a 'complementary' text detector. Complementary in the sense that it has quite different strenghts/weaknesses compared to the paddleocr detector.

Thank you, @wendlerc. Could you share your handwritten samples for comparison. It turns out some algorithms work well only with clean background. Mine, however, has somewhat different background.

bit-scientist commented 1 year ago

Hi @bit-scientist as @wendlerc mentioned in addition to integrate with other approaches like PyLia & transkribus this might help .

Thank you, @Mohammed20201991. I think PyLaia can be of help, but is transkribus available for free? Looks like it isn't free.