following is output of the gcv ocr for the particular file in JSON
OCR Output in JSON
The output of hocr-pdf conversion is as follows
Hocr-PDF output
As you can see if you search for english words it will highlight ,but for kannada language its giving gibberish results in the output file generated using hocr-pdf conversion
I am facing issues with hocr pdf conversion for English Kannada encoded into the text layer of the PDF File
I have a image below in kannada language (https://drive.google.com/file/d/11P2XMFWjmc0S6rzfOX58UtZZJkG2StNI/view?usp=sharing)
following is the corresponding output hocr of the file https://drive.google.com/file/d/1wm-40rCN_rSE4cqT499kZAjAs5y6A3xl/view?usp=sharing
following is output of the gcv ocr for the particular file in JSON OCR Output in JSON
The output of hocr-pdf conversion is as follows Hocr-PDF output
As you can see if you search for english words it will highlight ,but for kannada language its giving gibberish results in the output file generated using hocr-pdf conversion
Any guidance in this regards is appreciated