ocropus / hocr-tools

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
Other
364 stars 79 forks source link

a proposition to help hocr-tools become ZE best #174

Open evanescente-ondine opened 2 years ago

evanescente-ondine commented 2 years ago

It would simplify people's life A LOT, if you could write a version of hocr-pdf that does everything on its own: create the hOCR for all of a pdf's pages, merge them, then merge the resulting file with the pdf. and VOILÀ, no loss in the conversion, no mess, no fuss... Perhaps allowing for changing the engine too.

stweil commented 2 years ago

Why not simply use ocrmypdf?

isspid commented 1 year ago

@stweil one reason could be that ocrmypdf only allows for using tesseract as an engine, out of the box.