manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.61k stars 190 forks source link

Setting to ignore pictures #644

Open natrius opened 1 year ago

natrius commented 1 year ago

I have an old PDF that contains quite some pictures. In that case just the text is needed as gImageReader detect a lot of lines and so on a picture as well. I think that contributes to the fact that the pdf gets blown up from 50-80MB to about 340MB.

If i decide that i need a picture i can still take a screenshot :)

manisandro commented 1 year ago

This is more a tesseract training issue that something gImageReader can handle.

natrius commented 1 year ago

Interesting, good to know, thanks. What about a setting for exporting to pdf? Ignore graphics/pictures? In the Image settings - section maybe, as a checkbox on the first line and if checked, the rest is not changeable because its not relevant for that export anymore.