ocropus / hocr-tools

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
Other
359 stars 78 forks source link

Japanese support, again #144

Closed dinosauria123 closed 5 years ago

dinosauria123 commented 5 years ago

Thank you for making hocr-pdf. I could convert many old Japanese scanned data to searchable pdf.

After recent update of hocr-pdf, Japanese text in the pdf file are completely broken. It looks like €•‚ƒ „... ( †‡ˆ‰ Š‹Œ•Ž• (original Japanese text as 光学文字認識(こうがくもじにんしき) Alphabet are not broken.

The hocr file was made by my gcv2hocr, Japanese character is correctly readable.

If you something know this issue, please answer about it. page001.jpg page001.hocr.txt out0.pdf

zuphilip commented 5 years ago

How do you call the hocr-pdf exactly? Your example works for me correctly (see out.pdf) in Python3 with

hocr-pdf directory/ --savefile out.pdf

but I see some problems when trying to go over the terminal with the > command.

dinosauria123 commented 5 years ago

Thank you for your comment. I think you are right. I used > command to make a pdf file.

But I could not run hocr-pdf directory/ --savefile out.pdf

it says, usage: hocr-pdf [-h] imgdir hocr-pdf: error: unrecognized arguments: --savefile out.pdf

So I am confused.... I have installed hocr-tools via pip3 command.

zuphilip commented 5 years ago

Okay, I understand. What do you see when you type pip3 freeze?

dinosauria123 commented 5 years ago

This is output of pip3 freeze:

asn1crypto==0.24.0 attrs==17.4.0 Automat==0.6.0 blinker==1.4 certifi==2018.1.18 chardet==3.0.4 click==6.7 cloud-init==18.4 colorama==0.3.7 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 cryptography==2.1.4 distro-info==0.18 hocr-tools==1.1.1 httplib2==0.9.2 hyperlink==17.3.1 idna==2.6 incremental==16.10.1 Jinja2==2.10 jsonpatch==1.16 jsonpointer==1.10 jsonschema==2.6.0 keyring==10.6.0 keyrings.alt==3.0 language-selector==0.1 lxml==4.3.2 MarkupSafe==1.0 netifaces==0.10.4 oauthlib==2.0.6 PAM==0.4.2 Pillow==5.4.1 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycrypto==2.6.1 pygobject==3.26.1 PyJWT==1.5.3 pyOpenSSL==17.5.0 pyserial==3.4 python-apt==1.6.3+ubuntu1 python-debian==0.1.32 pyxdg==0.25 PyYAML==3.12 reportlab==3.5.13 requests==2.18.4 requests-unixsocket==0.1.5 SecretStorage==2.3.1 service-identity==16.0.0 six==1.11.0 ssh-import-id==5.7 systemd-python==234 Twisted==17.9.0 ufw==0.35 unattended-upgrades==0.1 urllib3==1.22 zope.interface==4.3.2

I found git version is works perfectly ! The issue caused by pip3 repo.

zuphilip commented 5 years ago

Okay, good to know that the issue is then resolve.

I will create a new issue for doing a new release and package.

dinosauria123 commented 5 years ago

Thank you for your helps ! I close this issue.