rpflugfelder / python-tesseract

Automatically exported from code.google.com/p/python-tesseract
0 stars 0 forks source link

test-slim doesn't work #35

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I installed python-tesseract on my CentOS 5.4 server following the wiki:
http://code.google.com/p/python-tesseract/wiki/HowToCompilePythonTesseractForCen
tos

When I tried to run 'test.py', the first two tests successed. However, the 
third test, 'ProcessPagesRaw', failed printing this message:

Test ProcessPagesRaw
Error in findFileFormatStream: truncated file
Error in pixReadStream: Unknown format: no pix returned
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined
Please call SetImage before attempting recognition.

When I tried to call 'ProcessPagesBuffer' in my python project, it gives the 
same error too. I think it's related to leptonica but I can't find the solution 
after almost one day straight down googling.

Now I am totally stuck. Please, any help would be greatly appreciated. Or are 
there ways to get around using 'ProcessPagesWraper' or 
'ProcessPagesFileStream', the first two functions called by 'test.py' which 
succeeded? Basically I need to parse an image from URL without saving it to the 
local drive, and the image format varies.

My setup:
CentOS 5.4, Python 2.6, leptonica 1.69, OpenCV 2.4.2.

Original issue reported on code.google.com by swol...@gmail.com on 10 Feb 2013 at 7:11

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Also, the attached bmp file can be parsed using this code:

print tesseract.ProcessPagesWrapper("p.bmp", _tessApi)

However, it fails if cv image is used:

image=cv.LoadImage("p.bmp", cv.CV_LOAD_IMAGE_GRAYSCALE)
tesseract.SetCvImage(image,_tessApi)
text=_tessApi.GetUTF8Text()       # Prints 'Empty page!!' error message
conf=_tessApi.MeanTextConf()
print text, conf

What's the reason for that? Did I do something wrong?

Original comment by swol...@gmail.com on 10 Feb 2013 at 10:51

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
https://code.google.com/p/tesseract-ocr/issues/detail?id=852&thanks=852&ts=13605
82829

Original comment by FreeT...@gmail.com on 11 Feb 2013 at 11:41

GoogleCodeExporter commented 9 years ago
The problem aroused because you did not provide sufficient border for 
tesseract. The following python program shall work

import cv2.cv as cv
import tesseract

image0=cv.LoadImage("p.bmp", cv.CV_LOAD_IMAGE_UNCHANGED)
print image0
offset=15
IPL_BORDER_REPLICATE=1
IPL_BORDER_CONSTANT=0
image=cv.CreateImage((image0.width+offset*2, image0.height+offset*2), 
cv.IPL_DEPTH_8U, 3 ) 
cv.CopyMakeBorder(image0,image, (offset,offset), IPL_BORDER_CONSTANT, 
(255,255,255)) 
cv.NamedWindow("Red Eye Test")
#cv.ShowImage("Red Eye Test", image)
#cv.WaitKey(0)
cv.DestroyWindow("Red Eye Test")
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
#api.SetPageSegMode(tesseract.PSM_SINGLE_WORD)
api.SetPageSegMode(tesseract.PSM_AUTO)
tesseract.SetCvImage(image,api)
text=api.GetUTF8Text()
conf=api.MeanTextConf()
image=None
print text
print conf

Original comment by FreeT...@gmail.com on 15 Feb 2013 at 10:34

GoogleCodeExporter commented 9 years ago
Yeah, I kinda figured that problem out on myself. I added some border to the 
picture and most of the 'Empty page' errors are gone, although there are still 
a few images still can't be parsed no matter how wide the border I added. Maybe 
it's a matter of the color of the border?

Anyway, what really makes me itchy is the first problem. Had anyone seen that 
error before?

Original comment by swol...@gmail.com on 25 Feb 2013 at 8:49

GoogleCodeExporter commented 9 years ago
What is "the first problem" you have referred?

Original comment by FreeT...@gmail.com on 25 Feb 2013 at 9:05

GoogleCodeExporter commented 9 years ago
The one in the op, that running of test-slim failed at 'ProcessPagesRaw'.

Original comment by swol...@gmail.com on 28 Feb 2013 at 2:59

GoogleCodeExporter commented 9 years ago
Send me your ssh account and password tben

Original comment by FreeT...@gmail.com on 28 Feb 2013 at 11:57

GoogleCodeExporter commented 9 years ago

Original comment by FreeT...@gmail.com on 25 Apr 2014 at 4:34