Image preprocessing before tesseract

juanj / KantanManga

KantanManga is an application that helps you read raw manga

https://testflight.apple.com/join/2hvkRQRO

GNU General Public License v3.0

40 stars 4 forks source link

Open juanj opened 3 years ago

juanj commented 3 years ago

Adding a preprocessing step can really improve the results.

Right now, the tesseract thresholding algorithm some times eats all the strokes of a kanji, leaving only the shape

From

To Out Out

It may be worth to use a different thresholding algorithm and let the user tweak it.

Removing furigana, speech bubble border and anything on the background gives results without junk