jwilk-archive / ocrodjvu

OCR for DjVu
GNU General Public License v2.0
45 stars 19 forks source link

error msg "No image suitable for OCR" is too vague #21

Open ghost opened 7 years ago

ghost commented 7 years ago

Every document I receive from a particular source is deemed "unsuitable" by ocrodjvu and results in a session that looks like this:

$ ocrodjvu --debug --engine=tesseract -l eng --in-place document.djvu Processing 'document.djvu':

The same error results if cuneiform is the engine, so apparently the error is not coming from the engine. Is ocrdjvu enforcing a certain image property, such as DPI? I see no image requirements in the manpage, so certainly It would be useful if the error message would list the requirements, and ideally indicate the unmet ones.

jwilk commented 7 years ago

Thanks for the bug report.

Yes, the warning comes from ocrodjvu itself. I agree that the message is rather obscure.

By default, ocrodjvu passes only page's mask to the OCR engine. (See the --render option in the manpage.) The warning is emitted if there was no mask at all for this page.