minni / pytesseracttrainer

Automatically exported from code.google.com/p/pytesseracttrainer
GNU General Public License v3.0
0 stars 0 forks source link

Program runs very slow, high cpu usage #10

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Run the program and there you go 80% usage of one core just after start.
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?
1.02

Please provide any additional information below.
Program runs very slow under windows 7 and ubuntu. Using scroll is almost 
imposible.

Original issue reported on code.google.com by sjuc...@gmail.com on 29 Oct 2010 at 4:08

GoogleCodeExporter commented 8 years ago
How many boxes (lines) are box file?

Original comment by zde...@gmail.com on 29 Oct 2010 at 4:39

GoogleCodeExporter commented 8 years ago
2008 boxes

Original comment by sjuc...@gmail.com on 29 Oct 2010 at 4:48

GoogleCodeExporter commented 8 years ago
than this a know problem - program create for each box gtk widget. So each 
additional box take more system sources. At the moment only solution is to use 
smaller image (number of boxes).

Original comment by zde...@gmail.com on 29 Oct 2010 at 5:03

GoogleCodeExporter commented 8 years ago
Hi, i'm running pytesseracttrainer on ubuntu 10.04 and when i start it, it 
causes such heavy load that my pc freezes. And i didn't even load a box file at 
this moment. I can't provide any debug or load stats, cause my pc freezes so 
fast.

What can be the problem? (this is my system: http://www.sysprofile.de/id75572 )

Original comment by maxe.ludwig on 22 Dec 2010 at 1:37

GoogleCodeExporter commented 8 years ago
CPU usage is very high for me too, but not just proportional to the number of 
boxes - in fact upon invoking pyTesseractTrainer-1.02.py usage is 100% on two 
cores even before opening an image file. CPU load actually decreases to some 
60-70% (cumulative of both cpus) when opening an image-box file containing some 
hundreds of boxes. (ubuntu 10.04)

Original comment by ohelshai@gmail.com on 1 Jan 2011 at 11:35

GoogleCodeExporter commented 8 years ago
Did your system has direct rendeding enabled (DRI) enabled? Because when I 
tested it on one server with CentOS 5.x 64 bit 4Gb RAM with DRI disabled 
pytesseracttrainined was faster IMHO than on my Mandrivallinux 2010.1 (64bit 
win 6Gb RAM)... I do not know if this is gtk, pygtk or python problem... 

Original comment by zde...@gmail.com on 2 Jan 2011 at 1:10

GoogleCodeExporter commented 8 years ago
I don't think I have DRI (at least I cannot find any mention to it in 
/etc/X11/xorg.conf or in nvidia-settings.
Now, I don't know much about python and gtk, but something about GIUs which 
rises me an obvious suspicion - is there perhaps an event loop constantly 
polling full throttle for user actions? In such cases, inserting even a 1ms 
noop is usually the quickest solution...

Original comment by enrico.s...@weizmann.ac.il on 2 Jan 2011 at 8:06

GoogleCodeExporter commented 8 years ago
v1.03 seems a step in the right direction. Invoking pytesseractrainer without 
arguments, the blank window stays open with negligible cpu usage. Only when 
loading the first image-box cpu is hogged. Btw, load is high both for Xorg and 
pytesseractrainer processes. As if the program was flooding X with events... 
maybe updates?

Original comment by enrico.s...@weizmann.ac.il on 4 Jan 2011 at 10:43

GoogleCodeExporter commented 8 years ago
There is function "redrawArea" that is run by event gtk 'expose-event'. This 
function is responsible for redrawing image area (gtk.DrawingArea) and drawing 
"red restangle". It looks to me that. Part of the code (responsible for 
choosing red color) and run even if there was no image... I can not believe it 
took some CPU ;-).

I will look closer to this function if there is possibility for improvement...

Original comment by zde...@gmail.com on 5 Jan 2011 at 7:51

GoogleCodeExporter commented 8 years ago
Just for the record, v.1.03 solved another problem for me, which I didn't 
trouble to report - on CentOS 5.4, the file open dialog was unfunctional (the 
browser didn't show folders and hanged). I thought it was a quirk of my python 
installation, but maybe the high cpu load had an impact on it.

Original comment by enrico.s...@weizmann.ac.il on 5 Jan 2011 at 9:34

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
As far as I remember problem with "freezing" open dialogue could be solved by 
upgrading pygtk and gtk.

Original comment by zde...@gmail.com on 5 Jan 2011 at 12:13

GoogleCodeExporter commented 8 years ago
I have an idea for drastically reducing the resources used by this program. 
Instead of having a set of boxes, one for each character in the boxfile, and 
each invoking a Gtk widget, why not replace the bottom pane with a text editor 
showing the boxfile's contents. Then when the cursor is on a line with a 
character, that character is outlined in the top pane (just like now). 
Obviously, this is a little less user friendly, but then you should be able to 
run the software without maxing out the CPU every time.

Since the boxfile lines are simple (e.g. "r 779 494 796 518 0"), it would still 
be pretty accessible, and you'd still be able to see the correspondence between 
characters and the image, and you'd also be able to merge boxes by editing the 
text.

Original comment by john.j.b...@gmail.com on 17 Jan 2011 at 12:15

GoogleCodeExporter commented 8 years ago
@john.j.b:  I have this idea too. But it will cause other problems if you need 
to delete box, combine too boxes, add box etc... And this approach will work 
only in case that 1 box = 1 char (e.g. you can not train char combination e.g. 
(fi) - see: http://groups.google.com/group/tesseract-ocr/msg/f835b55d9eaf7e9c.

Original comment by zde...@gmail.com on 17 Jan 2011 at 8:24