Closed shekarnode closed 6 years ago
Can you provide a hOCR file which causes this error? How did you create it?
I used Tesseract 4.0.0 to generate hocr Hocr File
This is the image for above generate Hocr
Is there any other solution for getting table from hocr data ?
This works for me as well after I have renamed the image and converted it to a jpg file.
jpg
file also in your directory?python -V
@zuphilip
are you able to generate searchable pdf ?
Tesseract has an option to output to pdf. Did you tried it?
are you able to generate searchable pdf ?
Yes, I see a searchable PDF, but I am working on Linux.
For windows terminal the encoding can be a problem. You can check the encoding for python in windows terminal by starting python
and then type
>>> import sys
>>> sys.stdout.encoding
If that is now UTF-8 then you can try to run the command with PYTHONIOENCODING=UTF-8
in front, i.e.
PYTHONIOENCODING=UTF-8 hocr-pdf . > out.pdf
i got pdf as output but it was just a normal pdf i.e. not in searchable format.
This is with the git bash on windows, right? Can you upload your result here?
@shekarnode There is text in your generated PDF and I can search for text as well.
I was using adobe reader and all the time was not able to search ,now when I opened the pdf in browser I found out it was searchable.
Thanks @zuphilip for helping out.
The pdf produced by Tesseract is also searchable.
While using the below command i m getting error related to character help out please