manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.63k stars 190 forks source link

Apply adaptive Otsu thresholding instead of normal Otsu. #547

Closed eighttails closed 3 years ago

eighttails commented 3 years ago

New Adaptive Otsu thresholding method is available in Tesseract 5.0.0 beta. (See http://www.leptonica.org/binarization.html)

It makes better recognition result.

manisandro commented 3 years ago

Thanks - in your view, does it make sense to have this configurable, or is this a good all-round option?

AvtechScientific commented 3 years ago

Thanks - in your view, does it make sense to have this configurable, or is this a good all-round option?

Whatever you decide - don't forget to replicate it in the GTK frontend... Thank you!

eighttails commented 3 years ago

Thanks - in your view, does it make sense to have this configurable, or is this a good all-round option?

Adaptive Otsu is good in most cases and configuration is not necessary for me. But another binarization method Sauvola is also available in latest tesseract. https://github.com/tesseract-ocr/tesseract/blob/60fd2b4abaa9c5c5c42d32db57576bc95d28a78a/src/ccmain/tesseractclass.cpp#L80 If somebody need sauvola making it configurable is good idea.

manisandro commented 3 years ago

Ok thanks!