manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.6k stars 188 forks source link

'Autodetect Layout' don't have an option to increase margins of selections #580

Closed chrsrns closed 2 years ago

chrsrns commented 2 years ago

Hi. The program keeps running into situations where it can't recognize texts properly because the selections don't have enough margins inside for the text. For testing, please refer to the image attached below.

Image used for Testing

Image used for testing

Initial Result using Autodetect Layout

Result without adjustments

Notice that without selection adjustments, the 'Question 1' text is recognized as only 'Question'. But after increasing it's selection size, the text is recognized properly.

Result with size adjustments

Result with adjustments

AgostinoSturaro commented 2 years ago

I am experiencing a similar issue. For instance, I had an all-caps text, and I had "O" turned into "C" because the border at the margin of the page was too narrow. I've also seen "q" and "g" getting confused with each-other.

manisandro commented 2 years ago

Should be fixed by https://github.com/manisandro/gImageReader/commit/c285fe5da11e189dd0c08e555ed137ab70c890a9.

AgostinoSturaro commented 2 years ago

@manisandro Can you provide a build with this fix? Also, looking at the code, could the +/- 2px go outside of bounds if the box is very close to the borders? Thank you.