Closed bicolino34 closed 10 months ago
How does your tag defining the page with its image look like? Example of how it should look like from the tests:
<div class='ocr_page' id='page_1' title='image "alice_1.png"; bbox 0 0 2488 3507; ppageno 0'>
@FriedrichFroebel It looks like this
<div title="bbox 0 0 3468 4624; image './20240128_105503.jpg'; ppageno 1; res 100; rot 90; scan_res 100 100" class="ocr_page" id="page_1">
This is related to https://github.com/ocropus/hocr-tools/blob/2867727ae986dd1e1727d98300da053caaffdb9b/hocr-extract-images#L28 where single quotation marks are expected in the outer level and only double quotation marks as nested values. Using
args = args.strip('"\'')
there instead seems to fix it.
Thank you! This has solved the issue
I had a one .jpg image and created hocr file for it with the program gImageReader. They have identical names and are located in the same directory. I tried to run in terminal to see how the script works:
hocr-extract-images ./20240128_105503.html
It produces the error:not found: './20240128_105503.jpg'
even though the image file is in the very same directory with this name that is shown as not found. Specifying image directory with -b doesn't help. I also tried converting image to .png