Open vstepaniuk opened 4 years ago
Only JPEG images are supported, not PNG.
I get a similar error using the supplied sample.zip, even after converting Tesseract's PNGs to JPG.
Before (with just PNGs):
% hocr-pdf . > new.pdf
Traceback (most recent call last):
File "/usr/local/bin/hocr-pdf", line 143, in <module>
export_pdf(args.imgdir, 300)
File "/usr/local/bin/hocr-pdf", line 51, in export_pdf
load_invisible_font()
File "/usr/local/bin/hocr-pdf", line 134, in load_invisible_font
uncompressed = bytearray(zlib.decompress(base64.decodestring(font)))
AttributeError: module 'base64' has no attribute 'decodestring'
After converting the PNGs to JPG (using imagemagick, i.e., convert new.png new.jpg
):
% hocr-pdf . > new.pdf
Traceback (most recent call last):
File "/usr/local/bin/hocr-pdf", line 143, in <module>
export_pdf(args.imgdir, 300)
File "/usr/local/bin/hocr-pdf", line 51, in export_pdf
load_invisible_font()
File "/usr/local/bin/hocr-pdf", line 134, in load_invisible_font
uncompressed = bytearray(zlib.decompress(base64.decodestring(font)))
AttributeError: module 'base64' has no attribute 'decodestring'
Tesseract v5.0.0-alpha-20210401 with Leptonica (via macOS Homebrew brew install tesseract --HEAD
)
Python 3.9.2
hocr-tools 1.1.1
When I execute this:
I get a corrupt PDF file, and evince says "The document contains no pages". see sample.zip for the files themselves.
Tesseract 4.1.1 Python 3.8.2 hocr-tools-1.1.1