kha-white / manga-ocr

Optical character recognition for Japanese text, with the main focus being Japanese manga
Apache License 2.0
1.76k stars 89 forks source link

Bad recognition? #80

Open Enerccio opened 3 months ago

Enerccio commented 3 months ago

Am I doing something wrong?

>>> mocr('/src/self/rensu/rensu-core/src/test/resources/test1.png') '結菜君は旅行部屋、買い取り出されたのですが、'

for image: test1

or

>>> mocr('/src/self/rensu/rensu-core/src/test/resources/test3.png') '男子...落着者は、そういう施術などをお'

for

test3

seems pretty bad...

MoeMonsuta commented 2 months ago

Am I doing something wrong?

>>> mocr('/src/self/rensu/rensu-core/src/test/resources/test1.png') '結菜君は旅行部屋、買い取り出されたのですが、'

for image: test1

or

>>> mocr('/src/self/rensu/rensu-core/src/test/resources/test3.png') '男子...落着者は、そういう施術などをお'

for

test3

seems pretty bad...

If the image quality is poor/small, doing it chunk by chunk (5-10 characters at a time or so) works better. Your second image worked fine when I scanned it with Manga OCR though, I did it all in one box selection. Make sure you are using the CUDA version (it says so when loading at first), it is "smarter".

過去に着用した衣装、描いた絵画、写真など彼女の思い出を辿るイベントとなります。

男「...落ち着けよ、そういう能力なんだろ」

Seems 100% accurate when I tried. You can try blowing up your images with something like Topaz Gigapixel AI if needed. I prefer the selection to clipboard method myself though.

Have you tried it on actual manga? It works extremely well.