manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.57k stars 187 forks source link

Crash (without warning) when reading certain sequences of characters #634

Closed rgreen5 closed 1 year ago

rgreen5 commented 1 year ago

gImagereader 3.3.1 / Linux Mint 20.1 / Cinnamon 4.8.6

This is reproducible every time with pdf pages that contain certain sequences of characters.

To reproduce

Note: page references are to the gImageReader pages, not the page numbers at the bottom of the PDF.

  1. Go to ANNA'S ARCHIVE and donwload a pdf of "The Secret Power of Music" (David Tame, 5.8 MB).
  2. Try to scan a section of the pdf containing either page 200 or page 214.

RESULT: The app works normally until it has finished scanning the page in question, then it. closes/crashes without warning. What the pages have in common is a sequence of em/en (?) dashes connecting words.

SantosSi commented 1 year ago

Cannot reproduce this with my setup: Debian Linux Testing, gImageReader commit a4820e. Tested with language en, OCR mode 'hOCR, PDF' on pages 200 and 214, 192-205, 200-215. Tested with language en, OCR mode 'plain text' on pages 200-215.

rgreen5 commented 1 year ago

I've slightly edited my OP to make clear that the app only crashes after scanning the page in question.

manisandro commented 1 year ago

Can you please post a stack trace of the crash?

rgreen5 commented 1 year ago

Can you please post a stack trace of the crash?

If you can supply instructions I'll give it a try.

manisandro commented 1 year ago

Actually first step would be to try using the latest version 3.4.1.

Then to get a stack trace, install gdb and:

$ gdb ./gimagereader-qt5 # (or gimagereader-qt6 or gimagereader-gtk depending on which version you are using)
(gdb) run
# Trigger crash
(gdb) bt

and post the output of the gdb bt command.

rgreen5 commented 1 year ago

gimagereader 3.4.1 / Linux Mint 20.1 / Cinnamon 4.8.6

Yes. Works fine with the latest version. Thanks.

manisandro commented 1 year ago

Ok thanks.