Closed prabhu92m closed 6 years ago
It seems that the PDF was not attached. I'm also not clear on whether you are using --rotate-pages
and that is causing incorrect rotation, or rotation was changed unexpectedly. Please provide your command line.
I probably won't be able to address for over a week. Please remember this is a voluntarily open source project. However, since you seem to require priority report, perhaps a commercial support contract would be of interest to you. If you wish to discuss that please reach to me: [EMAIL] – there are many ways I may be able to help you with your projects.
The delay is not a problem brother, I just wants to find the root cause for this issue. Here is the attached file. Orientated pdf.pdf. Add I am using --rotate-pages
for incorrect rotation.
I think i address the issue which may due to the negative values while forming the correction variable in the orient_page
method in the _pipeline.py script.
I fix this issue by simply validate by using an if condition which is i mentioned below.
if pdfinfo[pageno].rotation > orient_conf.angle:
correction = pdfinfo[pageno].rotation - orient_conf.angle
else:
correction = (orient_conf.angle - pdfinfo[pageno].rotation) % 360
Thank you for this report. The fix was a little more involved and your change probably does not cover all cases. I've added more better cases for rotation as well.
Despite this, the confidence is quite low on the file you submitted, and --rotate-pages
(at least for me) will still misrotate some pages because Tesseract guesses the text orientation incorrectly. Therefore the wrong correction is applied. When Tesseract gets the orientation right, the final orientation is now correct.
Fixed in 6ef2651.
Hi Team,
I facing orientation issue(wrongly orientated) while processing OCR in the attached PDF file. I have change the orientation threshold value to 0.002, even though the page is wrongly orientated on the your latest package.
Kindly do the needful ASAP.