openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
930 stars 152 forks source link

tesseract.detect_orientation returning wrong rotation direction after tesseract update #35

Closed andyschmid closed 8 years ago

andyschmid commented 8 years ago

Due to this change in tesseract

https://github.com/tesseract-ocr/tesseract/commit/6bbcb50dd9bd19b7bc348b066a501930ca3a4e29#diff-8f75e5c5721b655480127da396bd5caa

The output of "psm 0" has changed to:

Page number: 0 Orientation in degrees: 90 Rotate: 270 Orientation confidence: 19.30 Script: Latin Script confidence: 18.28

From previously:

Orientation: 1 Orientation in degrees: 270 Orientation confidence: 19.30 Script: 1 Script confidence: 18.28

This in turn causes the image to be flipped upside down instead of right side up.

jflesch commented 8 years ago

Thanks for the information. I will see what I can do. Do you know on which versions of Tesseract the changes were made ?

andyschmid commented 8 years ago

I can't say what version this change was introduced in, but I ran into testing the latest Ubuntu 16.04 docker image. Here is the tesseract version:

tesseract 3.04.01 leptonica-1.73 libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0

andyschmid commented 8 years ago

Here is one solution to the change in the output fields of the tesseract command. I'm currently using this as a work-around

issue-35-patch.txt

Note: this solution is on the assumption that the Rotate field is only in the new revision, and the value returned in the Rotate field is reversed as the value of the Orientation in degrees used to be. Hopefully the tesseract developers stick to an api moving forward.

jflesch commented 8 years ago

Ok, I think I will use your patch as is. However, next time, please send a pull request so I can (easily) give you proper credits.

jflesch commented 8 years ago

Fixed: 899872bf529b77325390108aebc4b24319448317 The next planned version is 0.4.0, but I'm having problems with the support of Libtesseract. So I may have to release a 0.3.2 first instead.

Thanks for this very detailed ticket and the corresponding fix :-)