ocropus / hocr-tools

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
Other
364 stars 79 forks source link

decodebytes() Depreciated in hocr-pdf use decodestring() #170

Open UBISOFT-1 opened 2 years ago

UBISOFT-1 commented 2 years ago
/home/muneeb/.local/bin/hocr-pdf:134: DeprecationWarning: decodestring() is a deprecated alias since Python 3.1, use decodebytes()
  uncompressed = bytearray(zlib.decompress(base64.decodestring(font)))

In the file we need to go ahead and use decodestring() function instead.

UBISOFT-1 commented 2 years ago

Also what will happen if we go ahead and change the encoding from 'latin-1' to 'utf-8' would that help if we are dealing with lets say Arabic Typescript.

kba commented 2 years ago
/home/muneeb/.local/bin/hocr-pdf:134: DeprecationWarning: decodestring() is a deprecated alias since Python 3.1, use decodebytes()
  uncompressed = bytearray(zlib.decompress(base64.decodestring(font)))

In the file we need to go ahead and use decodestring() function instead.

True, cf. https://github.com/ocropus/hocr-tools/issues/169

Also what will happen if we go ahead and change the encoding from 'latin-1' to 'utf-8' would that help if we are dealing with lets say Arabic Typescript.

Possibly, I have never used hocr-pdf with non-latin texts - what happens when you do? Let's discuss separately in #171.

FriedrichFroebel commented 1 year ago

This apparently already has been fixed, but not yet released: https://github.com/ocropus/hocr-tools/commit/d756f75ce8cf1224a78c9eab5db4952be17b9d70

jrochkind commented 1 year ago

Is this maybe the same problem that has turned into an outright error instead of deprecation now?

I am not very familiar with python, just trying out hocr-pdf, and I get this error:

Traceback (most recent call last):
  File "/opt/homebrew/bin/hocr-pdf", line 143, in <module>
    export_pdf(args.imgdir, 300)
  File "/opt/homebrew/bin/hocr-pdf", line 51, in export_pdf
    load_invisible_font()
  File "/opt/homebrew/bin/hocr-pdf", line 134, in load_invisible_font
    uncompressed = bytearray(zlib.decompress(base64.decodestring(font)))
                                             ^^^^^^^^^^^^^^^^^^^
AttributeError: module 'base64' has no attribute 'decodestring'

If this is the same thing, and has been fixed but not yet released... any plans for a release?