manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.6k stars 188 forks source link

suddenly extremely slow #602

Closed texwidowsandorphans closed 1 year ago

texwidowsandorphans commented 1 year ago

After updating to linux kernel 5.15.0-48-generic the text recognition has slowed down to snails' pace. A pdf-document of one A4-page with a simple English text written in Times took eight (!) minutes. Then I tried it with linux mint 20.3 after installing the new kernel, then with linux 21 and also with KDE neon. Same result, one single page took about 10 minutes to be recognized. (Just used gimagereader qt, this works as quick as before)

manisandro commented 1 year ago

Sorry, but I don't think there is anything I can do about this in gImageReader, also considering that the actual recognition is done by tesseract.

scheunengeist commented 1 year ago

I hope adding info to a closed issue isn't inappropriate here. Didn't want to open a new item.

Just wanted to add that I'm observing the same behaviour as the original reporter in gImageReader 3.4.1: gimagereader-qt5 works as expected, gimagereader-gtk takes unusually long to process an input image. A two-sentence example is handled almost instantaneously by the qt version, but takes about 6 or 7 seconds with gtk. I have a quad-core Intel i5-4460 CPU @ 3.20GHz.

My system (Devuan 5 / Debian Bookworm) only provides one version of tesseract (5.3.0) and its language files, so the base should be the same. When running tesseract directly from CLI, it works quite fast, as expected.

As the qt version of gImageReader works fine, there's no problem for me right now. And since, judging from the issue tracker, there's not a lot of people who experience this behaviour, this may not be worth putting effort into. Just wanted to second the observation, in case something's amiss in the gtk version.

And also: Thanks for gImageReader, it's a really great and useful piece of software! :)