Closed raindropsfromsky closed 2 years ago
There is already the preview feature (last button in the hOCR output pane toolbar) which kinda does this? In case what could be done is adding a keyboard shortcut to toggle it.
I missed that button because it is attached to the wrong panel (the tree pane). It should have been the last button in the main toolbar; above the main panel. (because the output preview appears in the main panel only).
Anyhow, I tried it, and it can be used for a very fast error-checking.
Only thing is, the detected output does exactly superimpose on the original.
Compare the original-
-with the superimposed preview:
The font is off by at least one point. Also notice that the text telescopes into each other at bottom; probably because each word extends into the subsequent word (again, because its font is too large). Also, can the font kerning be adjusted?
Once the text is made to match the image, then the combo can be easily used like comparascope. A simple manual arrangement would be fine (just three shortcuts: <, > and <+>)
gImageReader takes whatever font information is returned by tesseract in the hOCR document, but it is pretty much expected that you'll have to manually tweak the font to get a good result. A possible enhancement would be some logic to find a matching font, like user picks a font family and when whether to match the font say line-wise, paragraph-wise, and then based on the font metrics it computes the best font size.
That's a nice workflow!
Also please add a kerning slider, so that despite larger font size, all words can get accommodated within their given length, without running into (=overlapping with-) the next word.
[Edit] Wait: I have another idea! I observed just now that gImageReader creates the bounding boxes for each individual word perfectly. In fact, it boxes the risers and descenders of the letters perfectly.
Can this property be used to fit the chosen font in the box? The only variables are the font size and the kerning.
Well yeah that would be how the above is done, what I meant with line-wise and paragraph-wise is to say average the result over entire lines or paragraphs to avoid each word having a slightly different font size.
Caution: I came across many texts that have headings (or a line with larger/bold text). Can gImageReader detect such line and treat it separately?
Then it makes sense to calculate a common font for the paragraph (rather than for each line separately).
BTW the properties box shows x_fsize and x_wconf as parameters, and also bbox (=bounding box coordinates??).
But the properties do not include the kerning amount. Is it really available to gImageReader? Even if it does not come from Tesseract, can gImageReader manipulate it?
Secondly, the properties box has a dropdown list of fonts, but none of them is selected by default. Why is that so?
Finally, if you would like to debug this text-fitting issue, I am ready to support by downloading the product several times a day and providing a feedback.
Properties are those as returned by tesseract. Tesseract 4.x LTSM does not report font families.
In several industries, an instrument called "comparascope" is used that visually compares two items on a screen. Both items are supposed to have intricate details, and they may have a few differences.
The comparascope superimposes the images, and then shows both items alternately, and the view can immediately spot if there is any difference.
Can we use the same idea to compare the original image with the detected text?
In step-2, manual switching can also be given. For example, if the user presses the < and > keys, gImageReader shows him the image and detected text layer. If he presses both < and > keys, then gImageReader shows him both layers simultaneously (text layer with 50% transparency, superimposed on the image layer).