jwilk-archive / ocrodjvu

OCR for DjVu
GNU General Public License v2.0
45 stars 19 forks source link
djvu

Overview

ocrodjvu is a wrapper for OCR systems that allows you to perform OCR on DjVu_ files.

.. _DjVu: http://djvu.org/

Example

.. code:: console

$ wget -q 'https://sources.debian.org/data/main/o/ocropus/0.3.1-3/data/pages/alice_1.png' $ gm convert -threshold 50% 'alice_1.png' 'alice.pbm' $ cjb2 'alice.pbm' 'alice.djvu' $ ocrodjvu --in-place 'alice.djvu' Processing 'alice.djvu':

Requisites

The following software is needed to run ocrodjvu:

Additionally, some optional features require the following software:

The following software is needed to rebuild the manual pages from source:

.. _OCRopus: https://code.google.com/p/ocropus/ .. _Cuneiform: https://launchpad.net/cuneiform-linux .. _Ocrad: https://www.gnu.org/software/ocrad/ .. _GOCR: http://www-e.uni-magdeburg.de/jschulen/ocr/ .. _Tesseract: https://github.com/tesseract-ocr/tesseract .. _DjVuLibre: http://djvu.sourceforge.net/ .. _python-djvulibre: https://jwilk.net/software/python-djvulibre .. _lxml: https://lxml.de/ .. _subprocess32: https://pypi.org/project/subprocess32/ .. _PyICU: https://pypi.org/project/PyICU/ .. _html5lib: https://github.com/html5lib/html5lib-python .. _xsltproc: http://xmlsoft.org/XSLT/xsltproc2.html .. _DocBook XSL stylesheets: https://github.com/docbook/xslt10-stylesheets

Acknowledgment

ocrodjvu development was supported by the Polish Ministry of Science and Higher Education's grant no. N N519 384036 (2009–2012, https://bitbucket.org/jsbien/ndt).

.. vim:ft=rst ts=3 sts=3 sw=3 et tw=72