zyddnys / manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/
https://cotrans.touhou.ai/
GNU General Public License v3.0
4.59k stars 473 forks source link

[Bug]: OCR fails to recognise Japanese text in horizontal orientation #628

Closed itsdapolice closed 3 weeks ago

itsdapolice commented 1 month ago

Issue

It's pretty common nowadays to has Japanese text expressed in horizontal orientation mixed with other texts in the usual vertical orientation. The most common occurrence of this is when a phone with text messages is displayed.

Currently, the OCR (whatever it might be - be it ctd or manga_ocr) ends up failing to correctly detect the horizontal text (maybe failing is too strong of a word - it detects the characters, but it doesn't detect them as a sentence). The detected horizontal text ends up being translated to gibberish, degrading the whole experience.

I'm not sure how easy it would be to fix this - maybe adding a 2 pass OCR detection, with one pass detecting normal vertical text and a second pass horizontal?

Command Line Arguments

No response

Console logs

No response

zyddnys commented 1 month ago

please provide a failure example and your arguments

itsdapolice commented 1 month ago

Hi,

This is the command I'm using:

python -m manga_translator -l ENG --translator=offline --font-size-minimum=15 --font-path _fontpath_/KOMIKASL.ttf --mask-dilation-offset=5 -f jpg --detector=ctd --inpainter=lama_mpe -i "_inputFolder_" --manga2eng

fontpath and inputFolder are real paths, of course. Below are the input and output images. You can see the text on the lower left panel is badly translated, with the OCR failing to recognise the text as being horizontal.

00148 00148

Hopefully this helps.

itsdapolice commented 3 weeks ago

From what I can see in the several cases I came across, it seems the OCR recognizes horizontal, multi-line phrases as if each line was a column of vertical text orientation - including trying to interpret the kanji rotated, which is why the translation is complete rubbish.

I say this because the same doesn't happen if there's just a single line. If it is just a single, horizontal line, the OCR properly detects it as being on the horizontal orientation and detects correctly the characters.

I understand that there's a need to cater for vertical aligned text (being the majority of it, of course), but maybe the OCR code can be improved so that, if the text rotation is more than 45 degrees (with zero degrees being the text being perfectly vertical), it is to be detected as horizontal.

itsdapolice commented 3 weeks ago

Maybe this can help. I've ran it on demo mode, obtaining the intermediates. You can see that the bbox in question (5) is wrongly detected as being rotated 74.26 degrees (since it was detected as a vertical aligned text heavily rotated), but another bbox with the same orientation (3) is correctly detected as being rotated only -9.93 degrees (i.e. properly detected as horizontal text, with a small rotation). bboxes

itsdapolice commented 3 weeks ago

@dmMaze I think this may be more on your area? I think the code for the Comic Text Detector is yours, isn't it?

Checking the code, the issue doesn't seem to be on the OCR step, but on the textbox detection. The text box is not on the right orientation, so it passes the detected text on the vertical to the OCR.

dmMaze commented 3 weeks ago

@itsdapolice It's some text-detection postprocessing or ocr preprocessing I didn't write. Ballonstranslator can do it right: 图片

I'll take my time to investigate why it's wrong in manga-image-translator

itsdapolice commented 3 weeks ago

@itsdapolice It's some text-detection postprocessing or ocr preprocessing I didn't write. Ballonstranslator can do it right: 图片

I'll take my time to investigate why it's wrong in manga-image-translator

Thanks, mate. I did try to wrap my head around the code, but I'm way too unfamiliar with it to be of any help.

itsdapolice commented 3 weeks ago

I've tested the new commit and works like a charm. Thanks, @dmMaze .

dmMaze commented 3 weeks ago

@itsdapolice Just a reminder 8925d36 would fail some vertical cases. Please update to 391d799

itsdapolice commented 3 weeks ago

@itsdapolice Just a reminder 8925d36 would fail some vertical cases. Please update to 391d799

Thanks, will do.