Open sarangtc opened 4 years ago
The message refers to line 48:
https://github.com/tmbdev/hocr-tools/blob/b3e380779e5c88ad99dca2a6b8b292c0f375fd68/hocr-cut#L48
What is the exact call of hocr-cut
you are doing? Can you share the hocr file here?
Hi,
Sorry. missed your message.
These are the full details:
I installed hocr-tools on ubuntu-16.04 using: sudo pip install hocr-tools
although hocr-pdf works hocr-cut command gave: hocr-cut: command not found
so I copied the code from the github to /usr/local/bin/hocr-cut and made it executable
in my home user folder (where hocr-pdf works), I ran the command: hocr-cut test_0012.hocr "test_0012.hocr" file is attached for reference the output was:
Traceback (most recent call last): File "/usr/local/bin/hocr-cut", line
48, in
I tried on various 2 columned hocr files, but all gave the same error message.
On Thu, Aug 29, 2019 at 2:17 PM Philipp Zumstein notifications@github.com wrote:
The message refers to line 48:
https://github.com/tmbdev/hocr-tools/blob/b3e380779e5c88ad99dca2a6b8b292c0f375fd68/hocr-cut#L48
What is the exact call of hocr-cut you are doing? Can you share the hocr file here?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tmbdev/hocr-tools/issues/154?email_source=notifications&email_token=AMP46HKLE3WD4EZL5QLESCTQG6ELHA5CNFSM4IRO37VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5NX5KI#issuecomment-526089897, or mute the thread https://github.com/notifications/unsubscribe-auth/AMP46HJQMH5HPBHV6KQA6H3QG6ELHANCNFSM4IRO37VA .
The pip package is not up-to-date and therefore hocr-cut
is not found in the beginning. Try instead
pip install git+https://github.com/tmbdev/hocr-tools.git
However, I am not sure this will solve your problems...
Your example file is not attached here to this issue in GitHub (I guess that this does not work when you attach it to the email only). Can you upload it directly to this issue in GitHub? Or upload it e.g. at https://pastebin.com/ and give the link here.
here is the file test_0012.txt
Okay, I see that you don't have specified the image in your hocr file on line 13. Try to adapt this line to something like
<div class='ocr_page' lang='unknown' title='image IMAGENAME.PNG; bbox 0 0 6169 4648'>
where you should replace IMAGENAME.PNG
with the name of your image file. Does that work?
(We can try to make a better error message for this.)
ok, that worked, it gave me a myimage.left.jpg and myimage.right.jpg I was primarily expecting two hocr files, one for each half (later to be merged with the images to make the hocr-pdf)
I assumed this from the description: Cut a page (horizontally) into two pages in the middle such that the most of the bounding boxes are separated nicely, e.g. cutting double pages or double columns
I guess you meant the image itself and not the hocr file !!!
hocr-cut.py gives the following error:
Traceback (most recent call last): File "../hocr-cut.py", line 48, in
filename = os.path.join(os.path.dirname(args.file), filename)
File "/usr/lib/python2.7/posixpath.py", line 68, in join
if b.startswith('/'):
AttributeError: 'NoneType' object has no attribute 'startswith'