Closed jwilk closed 12 years ago
The patch doesn't look crazy, but at least documentation would have to be updated (lib/cli/hocr2djvused.py
:31 and doc/hocr2djvused.xml
.
Some nitpicking:
lst += x
to lst.extend(x)
.Implemented in f9922a64007475af87494464804cfb7155e80ccc.
Fixed in 0.7.11.
Issue reported by @thkoch2001:
Hi,
I ran tesseract manually on multiple image files (try GNU Parallel!) and ended up with one html (hocr) file for every page. To combine those html pages to one djvused script I hacked your hocr2djvused a bit.
My version now optionally also accepts input file parameters and processes those as consecutive pages.
You can find my changes here: https://github.com/thkoch2001/ocrodjvu/commit/318657e4a45bb8c8002e06382b73d49e984c0f30