yxm4109 / python-tesseract

Automatically exported from code.google.com/p/python-tesseract
0 stars 0 forks source link

Segmentation fault on "Empty page!!" errors #37

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Process a bunch of images.
2. Suddenly:
> Empty Page!!
> Segmentation Fault (core dumped)

Happens pretty randomly, I would guess 1/200 probability. Images get processed 
just fine on retry.

What is the expected output? What do you see instead?
It would be great if I just got a Python exception and could retry processing. 
Instead the whole program crashes and I have to start from beginning.

What version of the product are you using? On what operating system?
I'm using: python-tesseract_0.8-1.6_amd64.deb

Ubuntu 12.10 64-bit
tesseract-ocr 3.02.01-6
liblept3 1.69-3.1ubuntu1
libtesseract3 3.02.01-6 

Please provide any additional information below.

GDB backtrace:

Empty page!!

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7046c01 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) backtrace
#0  0x00007ffff7046c01 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff61d072b in retParser(char const*) () from 
/usr/lib/pymodules/python2.7/_tesseract.so
#2  0x00007ffff61d0a35 in ProcessPagesBuffer(char*, int, 
tesseract::TessBaseAPI*) () from /usr/lib/pymodules/python2.7/_tesseract.so
#3  0x00007ffff61cf29d in ?? () from /usr/lib/pymodules/python2.7/_tesseract.so
#4  0x000000000045f912 in PyEval_EvalFrameEx ()
#5  0x0000000000467209 in PyEval_EvalCodeEx ()
#6  0x00000000004d0242 in PyEval_EvalCode ()
#7  0x00000000005102bb in ?? ()
#8  0x000000000044a466 in PyRun_FileExFlags ()
#9  0x000000000044a97a in PyRun_SimpleFileExFlags ()
#10 0x000000000044b6bc in Py_Main ()
#11 0x00007ffff6f0576d in __libc_start_main () from 
/lib/x86_64-linux-gnu/libc.so.6
#12 0x00000000004ce0ad in _start ()

Original issue reported on code.google.com by seppo.er...@gmail.com on 22 Mar 2013 at 3:58

GoogleCodeExporter commented 9 years ago
either memory leak or tesseract failed to parse the image.

Very likely be memory leak because you stated that retrying will just work.

If so, send me the source code and/or the images that caused the problem

Original comment by FreeT...@gmail.com on 22 Mar 2013 at 4:53

GoogleCodeExporter commented 9 years ago
Apparently tesseract failed to parse the image.

While saving some images for testing I found out that the script was always 
failing on the same images. Images are from screen-shots so they seem to differ 
slightly every time even though the script crops them exactly the same every 
time.

I raised the contrast and now all the images pass. Still, if tesseract fails to 
parse an image it exits with segmentation fault which isn't too nice.

I attached one image that fails.

Original comment by seppo.er...@gmail.com on 22 Mar 2013 at 5:53

Attachments:

GoogleCodeExporter commented 9 years ago
as stated in Example 3 on the frontpage
#### you may need to thicken the border in order to make tesseract feel happy 
to ocr your image #####

Otherwise, you may need to use

try:
  doSomething()
except: 
  pass
or

try:
  doSomething()
except Exception: 
  pass

Original comment by FreeT...@gmail.com on 22 Mar 2013 at 6:08

GoogleCodeExporter commented 9 years ago
Thanks for your help!

Original comment by seppo.er...@gmail.com on 22 Mar 2013 at 6:18

GoogleCodeExporter commented 9 years ago
I have made some changes in the codes. Hope it will help prevent the 
Segmentation fault from happening.

https://python-tesseract.googlecode.com/files/python-tesseract_0.8-1.7_amd64.deb

Original comment by FreeT...@gmail.com on 25 Mar 2013 at 2:13

GoogleCodeExporter commented 9 years ago
Still segfaults if tesseract cannot find anything from the image, same 
backtrace. It is not bothering me too much any more since I found out that 
using:

api.SetPageSegMode(tesseract.PSM_SINGLE_WORD)

instead of tesseract.PSM_AUTO greatly improves the detection rate. It can even 
find the '0.25' in attached image even though the contrast is really poor.

BTW, 0.8-1.7 added opencv-2.4 as a dependency which is only available from 
PPAs/experimental for Ubuntu/Debian.

Original comment by seppo.er...@gmail.com on 25 Mar 2013 at 8:47

Attachments:

GoogleCodeExporter commented 9 years ago
Any news there? This is really annoying as long as it crashes python 
interpreter. Any ideas how to stop this?

Original comment by daniil...@gmail.com on 9 Jun 2013 at 11:51

GoogleCodeExporter commented 9 years ago
I continue to have this segmentation fault occur as well, is this project dead 
or is there some life still here?

Original comment by dcfair...@gmail.com on 25 Mar 2014 at 11:25

GoogleCodeExporter commented 9 years ago
can you send me a sample code that could trigger the crash. Will look into it.

Original comment by FreeT...@gmail.com on 26 Mar 2014 at 12:50

GoogleCodeExporter commented 9 years ago
Try this version pls
https://drive.google.com/file/d/0BxR8J6QWLdsVSVplcl8xQWZIUjg/edit?usp=sharing

Original comment by FreeT...@gmail.com on 23 Apr 2014 at 2:02

GoogleCodeExporter commented 9 years ago
The problem should be fixed .

Original comment by FreeT...@gmail.com on 25 Apr 2014 at 4:36