ocrodjvu is a wrapper for OCR systems that allows you to perform OCR on DjVu_ files.
.. _DjVu: http://djvu.org/
.. code:: console
$ wget -q 'https://sources.debian.org/data/main/o/ocropus/0.3.1-3/data/pages/alice_1.png' $ gm convert -threshold 50% 'alice_1.png' 'alice.pbm' $ cjb2 'alice.pbm' 'alice.djvu' $ ocrodjvu --in-place 'alice.djvu' Processing 'alice.djvu':
The following software is needed to run ocrodjvu:
Python 2.7
an OCR engine:
DjVuLibre_ ≥ 3.5.21
python-djvulibre_
subprocess32_
lxml_ ≥ 2.0
Additionally, some optional features require the following software:
PyICU_ ≥ 1.0.1 —
required for the --word-segmentation=uax29
option
html5lib_ —
required for the --html5
option
The following software is needed to rebuild the manual pages from source:
xsltproc_
DocBook XSL stylesheets
_
.. _OCRopus: https://code.google.com/p/ocropus/ .. _Cuneiform: https://launchpad.net/cuneiform-linux .. _Ocrad: https://www.gnu.org/software/ocrad/ .. _GOCR: http://www-e.uni-magdeburg.de/jschulen/ocr/ .. _Tesseract: https://github.com/tesseract-ocr/tesseract .. _DjVuLibre: http://djvu.sourceforge.net/ .. _python-djvulibre: https://jwilk.net/software/python-djvulibre .. _lxml: https://lxml.de/ .. _subprocess32: https://pypi.org/project/subprocess32/ .. _PyICU: https://pypi.org/project/PyICU/ .. _html5lib: https://github.com/html5lib/html5lib-python .. _xsltproc: http://xmlsoft.org/XSLT/xsltproc2.html .. _DocBook XSL stylesheets: https://github.com/docbook/xslt10-stylesheets
ocrodjvu development was supported by the Polish Ministry of Science and Higher Education's grant no. N N519 384036 (2009–2012, https://bitbucket.org/jsbien/ndt).
.. vim:ft=rst ts=3 sts=3 sw=3 et tw=72