manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.6k stars 188 forks source link

worse results than previous version #588

Closed u6aab closed 1 year ago

u6aab commented 2 years ago

I'm getting a bunch of random letters and symbols included in my results after upgrading from 3.3.1 to 3.4.0, both times using default options running on windows 10. it seems to have something to do with image quality (maybe sensitivity to grain/noise?) since running a screenshot of a pdf through doesn't produce the same issues.

3 4 0 3 3 1

Jossi2 commented 2 years ago

I am encountering exactly the same problem. I use gImageReader for OCR of old German (Fraktur) texts. I expected some improvement with version 3.4.0, but while I get almost 100% correct results with version 3.3.1, version 3.4.0 delivers lots of mistakes - with the same image to read, the same *.traineddata file and the same program options. Is there a possible reason for this, e.g. the change from tessdata 4 to tessdata 5? Result with 3.3.1:

OCR 3 3 1

Result with 3.4.0: OCR 3 4 0

ToxicSmurf commented 1 year ago

Same issue for me. About 90% of all output is random letters. File, file type, and dpi make no difference. Occurs every time.

hendrack commented 1 year ago

I can't tell from the screenshots, are you using the Windows version? I use version 3.4 in linux and I have no issues, but the portable windows version produces the bad OCR results, with the same source material.

Jossi2 commented 1 year ago

Issue seems to be solved with version 3.4.1. Almost faultless results again. Thank you for taking the trouble to further improve this excellent program!

manisandro commented 1 year ago

Thanks for your feedback!