manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.62k stars 190 forks source link

Feature request: "Comparascope" view to show the detected text superimposed on original image #449

Closed raindropsfromsky closed 2 years ago

raindropsfromsky commented 4 years ago

In several industries, an instrument called "comparascope" is used that visually compares two items on a screen. Both items are supposed to have intricate details, and they may have a few differences.

The comparascope superimposes the images, and then shows both items alternately, and the view can immediately spot if there is any difference.

Can we use the same idea to compare the original image with the detected text?

  1. Just superimpose the detected text on the original image.
  2. Switch between the two periodically (let the user select this time interval).
  3. Once the user identifies the error, let him stop the auto-switching, and edit the word in the detected text. Once the correction is made, let him restart the auto-switching.

In step-2, manual switching can also be given. For example, if the user presses the < and > keys, gImageReader shows him the image and detected text layer. If he presses both < and > keys, then gImageReader shows him both layers simultaneously (text layer with 50% transparency, superimposed on the image layer).

manisandro commented 4 years ago

There is already the preview feature (last button in the hOCR output pane toolbar) which kinda does this? In case what could be done is adding a keyboard shortcut to toggle it.

raindropsfromsky commented 4 years ago

I missed that button because it is attached to the wrong panel (the tree pane). It should have been the last button in the main toolbar; above the main panel. (because the output preview appears in the main panel only).

Anyhow, I tried it, and it can be used for a very fast error-checking.

Only thing is, the detected output does exactly superimpose on the original.

Compare the original- image

-with the superimposed preview: image

The font is off by at least one point. Also notice that the text telescopes into each other at bottom; probably because each word extends into the subsequent word (again, because its font is too large). Also, can the font kerning be adjusted?

Once the text is made to match the image, then the combo can be easily used like comparascope. A simple manual arrangement would be fine (just three shortcuts: <, > and <+>)

manisandro commented 4 years ago

gImageReader takes whatever font information is returned by tesseract in the hOCR document, but it is pretty much expected that you'll have to manually tweak the font to get a good result. A possible enhancement would be some logic to find a matching font, like user picks a font family and when whether to match the font say line-wise, paragraph-wise, and then based on the font metrics it computes the best font size.

raindropsfromsky commented 4 years ago

That's a nice workflow!

Also please add a kerning slider, so that despite larger font size, all words can get accommodated within their given length, without running into (=overlapping with-) the next word.

[Edit] Wait: I have another idea! I observed just now that gImageReader creates the bounding boxes for each individual word perfectly. In fact, it boxes the risers and descenders of the letters perfectly.

Can this property be used to fit the chosen font in the box? The only variables are the font size and the kerning.

manisandro commented 4 years ago

Well yeah that would be how the above is done, what I meant with line-wise and paragraph-wise is to say average the result over entire lines or paragraphs to avoid each word having a slightly different font size.

raindropsfromsky commented 4 years ago

Caution: I came across many texts that have headings (or a line with larger/bold text). Can gImageReader detect such line and treat it separately?

Then it makes sense to calculate a common font for the paragraph (rather than for each line separately).

BTW the properties box shows x_fsize and x_wconf as parameters, and also bbox (=bounding box coordinates??).

But the properties do not include the kerning amount. Is it really available to gImageReader? Even if it does not come from Tesseract, can gImageReader manipulate it?

Secondly, the properties box has a dropdown list of fonts, but none of them is selected by default. Why is that so?

Finally, if you would like to debug this text-fitting issue, I am ready to support by downloading the product several times a day and providing a feedback.

manisandro commented 4 years ago

Properties are those as returned by tesseract. Tesseract 4.x LTSM does not report font families.

raindropsfromsky commented 4 years ago

TogglePreview.zip