Open smijo149 opened 3 years ago
I am afraid that hocr-pdf
was never tested with RTL text. Using bidi
like in https://github.com/tesseract-ocr/tesstrain/blob/master/generate_wordstr_box.py might fix that.
Thanks! I will try it out and see if that works for me.
@smijo149 Looks like you solved this. I wonder if the maintainers of hocr-tools would be interested in your PR?
@joewiz Yeah I was able to solve the issue based on @stweil suggestion. I have opened a PR #165 if anyone is interested. Thanks!
The pdf file generated using
hocr-pdf
has Hebrew text printed in the opposite direction.Steps I followed:
hocr-pdf --savefile output.pdf actual-file.jpg
to generate pdf file.The pdf file has Hebrew text inserted in it but in the reverse order.
Actual image:
This is how my hocr file looks:
Text in pdf file: (I have set text visibility mode to 0 so that the inserted text is visible)
Hebrew is a right to left language so not sure if I have to pass any language or direction parameters to get this right.