juanj / KantanManga

KantanManga is an application that helps you read raw manga
https://testflight.apple.com/join/2hvkRQRO
GNU General Public License v3.0
40 stars 4 forks source link

Image preprocessing before tesseract #37

Open juanj opened 3 years ago

juanj commented 3 years ago

Adding a preprocessing step can really improve the results.

Right now, the tesseract thresholding algorithm some times eats all the strokes of a kanji, leaving only the shape

From In in

To Out Out

It may be worth to use a different thresholding algorithm and let the user tweak it.

Removing furigana, speech bubble border and anything on the background gives results without junk