OCR/textify: use hOCR and allow manual edit process for corrections

jimmejardine / qiqqa-open-source

The open-sourced version of the award-winning Qiqqa research management tool for Windows

GNU General Public License v3.0

369 stars 60 forks source link

OCR/textify: use hOCR and allow manual edit process for corrections #179

Open GerHobbelt opened 4 years ago

GerHobbelt commented 4 years ago

Spinoff from the notes in #159 lest I forget about the idea of a user driven manual edit/correct process for the OCR-ed text: this would improve the text layer of the documents and thus the search index and any exported PDF if we incorporate that hOCR text layer in the output PDFs (currently we copy the original PDFs to the export directory as that's the only PDFs we currently have)