Failure to recognize differently-coloured text within larger chunk of text

ngoomie commented 5 months ago

Hi! I'm using manga-ocr to help me play through the PC release of the Ace Attorney trilogy. Often the games will have certain keywords highlighted in a different colour for emphasis, and when this happens, manga-ocr will usually fail to properly recognize the differently-coloured text. If the entire block of text is in a colour other than black or white it will be fine, and so it also is fine if you select just the word that manga-ocr failed to process on the prior attempt.

An example: 5MHET6DyIQ First pass this gets OCR'd like this:

裁判が進むと、こんなふうに《．．．》が提出されていくの。

Second pass on just the red word, it gets OCR'd correctly as 証拠品.

This has happened rather consistently with any instances of text like this that I've found. I can probably provide more examples if needed.

I also understand that this is somewhat out of the scope of manga-ocr since, well, it's not manga! And it technically works fine enough to be usable anyways. So I understand if this issue doesn't get touched on at all, but I figured it might be worth reporting anyways just in case.

ngoomie commented 5 months ago

Oh actually, another thing worth mentioning too is sometimes it just stops at the differently-coloured word and doesn't finish the rest of the chunk of text that was white. An example here too: X2IcLwSr0M My OCR result was the following:

殺人現場から逃げていく被告人・矢張くんを

kha-white commented 5 months ago

Thank you, I'm not sure if I can do anything about it soon, but it's an interesting insight.

HighLiuk commented 2 months ago

@ngoomie MangaOcr preprocessed the image by first making it gray-scale. This is how your image looks like when converted to gray-scale.

As you can see, it's barely readable. In your case of white text with some red over black background, maybe it's better to preprocess the image to turn red into white and then get the results.

import numpy as np
from PIL import Image

img = Image.open('aceattorney.jpg')
# Make sure to not have alpha channel
img = img.convert('RGB')
# Convert the image to a NumPy array
img = np.array(img)
# Compute the maximum value for each pixel across the RGB channels
img = np.max(img, axis=2)
# Create a new grayscale image from the maximum values
img = Image.fromarray(img)

print(mocr(img))

And this is the recognized text: 裁判が進むと、こんなふうに《証拠品》が提出されていくの。

This way of preprocessing the image also works if the text is green, blue, or colored in general. But it works differently for text that is black over white background (in which case you should probably take the minimum instead of maximum).

Hope it helps.

HighLiuk commented 2 months ago

kha-white / manga-ocr

Failure to recognize differently-coloured text within larger chunk of text #69