Closed dinosauria123 closed 5 years ago
How do you call the hocr-pdf
exactly? Your example works for me correctly (see out.pdf) in Python3 with
hocr-pdf directory/ --savefile out.pdf
but I see some problems when trying to go over the terminal with the >
command.
Thank you for your comment. I think you are right. I used > command to make a pdf file.
But I could not run hocr-pdf directory/ --savefile out.pdf
it says, usage: hocr-pdf [-h] imgdir hocr-pdf: error: unrecognized arguments: --savefile out.pdf
So I am confused.... I have installed hocr-tools via pip3 command.
Okay, I understand. What do you see when you type pip3 freeze
?
This is output of pip3 freeze:
asn1crypto==0.24.0 attrs==17.4.0 Automat==0.6.0 blinker==1.4 certifi==2018.1.18 chardet==3.0.4 click==6.7 cloud-init==18.4 colorama==0.3.7 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 cryptography==2.1.4 distro-info==0.18 hocr-tools==1.1.1 httplib2==0.9.2 hyperlink==17.3.1 idna==2.6 incremental==16.10.1 Jinja2==2.10 jsonpatch==1.16 jsonpointer==1.10 jsonschema==2.6.0 keyring==10.6.0 keyrings.alt==3.0 language-selector==0.1 lxml==4.3.2 MarkupSafe==1.0 netifaces==0.10.4 oauthlib==2.0.6 PAM==0.4.2 Pillow==5.4.1 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycrypto==2.6.1 pygobject==3.26.1 PyJWT==1.5.3 pyOpenSSL==17.5.0 pyserial==3.4 python-apt==1.6.3+ubuntu1 python-debian==0.1.32 pyxdg==0.25 PyYAML==3.12 reportlab==3.5.13 requests==2.18.4 requests-unixsocket==0.1.5 SecretStorage==2.3.1 service-identity==16.0.0 six==1.11.0 ssh-import-id==5.7 systemd-python==234 Twisted==17.9.0 ufw==0.35 unattended-upgrades==0.1 urllib3==1.22 zope.interface==4.3.2
I found git version is works perfectly ! The issue caused by pip3 repo.
Okay, good to know that the issue is then resolve.
I will create a new issue for doing a new release and package.
Thank you for your helps ! I close this issue.
Thank you for making hocr-pdf. I could convert many old Japanese scanned data to searchable pdf.
After recent update of hocr-pdf, Japanese text in the pdf file are completely broken. It looks like €•‚ƒ „... ( †‡ˆ‰ Š‹Œ•Ž• (original Japanese text as 光学文字認識(こうがくもじにんしき) Alphabet are not broken.
The hocr file was made by my gcv2hocr, Japanese character is correctly readable.
If you something know this issue, please answer about it. page001.jpg page001.hocr.txt out0.pdf