openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
930 stars 152 forks source link

Leptonica causes unnecessary TesseractError #39

Closed TeisD closed 8 years ago

TeisD commented 8 years ago

TestOrientation throws the following error on Tesseract 3.04.01 (installed via HomeBrew on OSX 10.10.5):

TesseractError: (-1, u'No script found in image (Warning in pixReadMemBmp: work-around: writing to a temp file\nPage number: 0\nOrientation in degrees: 0\nRotate: 0\nOrientation confidence: 15.38\nScript: Latin\nScript confidence: 466.67)')

The error is encountered when executing output = {x: y for (x, y) in output} on line 172.

This is caused by the PixReadMemBmp error which contains an extra colon, resulting in an array of 3 elements when split with [line.split(": ",1) for line in output if (": " in line)], resulting in a ValueError later on at {x: y for (x, y) in output}.

More on the cause of the PixReadMemBmp error can be found here and here.

As the orientation and confidence are calculated correctly, I think the error is not critical and should not cause the test to fail?

jflesch commented 8 years ago

Forgot to close this ticket, sorry. Fixed by @TeisD ( https://github.com/jflesch/pyocr/pull/40 )