Open GoogleCodeExporter opened 9 years ago
How many boxes (lines) are box file?
Original comment by zde...@gmail.com
on 29 Oct 2010 at 4:39
2008 boxes
Original comment by sjuc...@gmail.com
on 29 Oct 2010 at 4:48
than this a know problem - program create for each box gtk widget. So each
additional box take more system sources. At the moment only solution is to use
smaller image (number of boxes).
Original comment by zde...@gmail.com
on 29 Oct 2010 at 5:03
Hi, i'm running pytesseracttrainer on ubuntu 10.04 and when i start it, it
causes such heavy load that my pc freezes. And i didn't even load a box file at
this moment. I can't provide any debug or load stats, cause my pc freezes so
fast.
What can be the problem? (this is my system: http://www.sysprofile.de/id75572 )
Original comment by maxe.ludwig
on 22 Dec 2010 at 1:37
CPU usage is very high for me too, but not just proportional to the number of
boxes - in fact upon invoking pyTesseractTrainer-1.02.py usage is 100% on two
cores even before opening an image file. CPU load actually decreases to some
60-70% (cumulative of both cpus) when opening an image-box file containing some
hundreds of boxes. (ubuntu 10.04)
Original comment by ohelshai@gmail.com
on 1 Jan 2011 at 11:35
Did your system has direct rendeding enabled (DRI) enabled? Because when I
tested it on one server with CentOS 5.x 64 bit 4Gb RAM with DRI disabled
pytesseracttrainined was faster IMHO than on my Mandrivallinux 2010.1 (64bit
win 6Gb RAM)... I do not know if this is gtk, pygtk or python problem...
Original comment by zde...@gmail.com
on 2 Jan 2011 at 1:10
I don't think I have DRI (at least I cannot find any mention to it in
/etc/X11/xorg.conf or in nvidia-settings.
Now, I don't know much about python and gtk, but something about GIUs which
rises me an obvious suspicion - is there perhaps an event loop constantly
polling full throttle for user actions? In such cases, inserting even a 1ms
noop is usually the quickest solution...
Original comment by enrico.s...@weizmann.ac.il
on 2 Jan 2011 at 8:06
v1.03 seems a step in the right direction. Invoking pytesseractrainer without
arguments, the blank window stays open with negligible cpu usage. Only when
loading the first image-box cpu is hogged. Btw, load is high both for Xorg and
pytesseractrainer processes. As if the program was flooding X with events...
maybe updates?
Original comment by enrico.s...@weizmann.ac.il
on 4 Jan 2011 at 10:43
There is function "redrawArea" that is run by event gtk 'expose-event'. This
function is responsible for redrawing image area (gtk.DrawingArea) and drawing
"red restangle". It looks to me that. Part of the code (responsible for
choosing red color) and run even if there was no image... I can not believe it
took some CPU ;-).
I will look closer to this function if there is possibility for improvement...
Original comment by zde...@gmail.com
on 5 Jan 2011 at 7:51
Just for the record, v.1.03 solved another problem for me, which I didn't
trouble to report - on CentOS 5.4, the file open dialog was unfunctional (the
browser didn't show folders and hanged). I thought it was a quirk of my python
installation, but maybe the high cpu load had an impact on it.
Original comment by enrico.s...@weizmann.ac.il
on 5 Jan 2011 at 9:34
[deleted comment]
As far as I remember problem with "freezing" open dialogue could be solved by
upgrading pygtk and gtk.
Original comment by zde...@gmail.com
on 5 Jan 2011 at 12:13
I have an idea for drastically reducing the resources used by this program.
Instead of having a set of boxes, one for each character in the boxfile, and
each invoking a Gtk widget, why not replace the bottom pane with a text editor
showing the boxfile's contents. Then when the cursor is on a line with a
character, that character is outlined in the top pane (just like now).
Obviously, this is a little less user friendly, but then you should be able to
run the software without maxing out the CPU every time.
Since the boxfile lines are simple (e.g. "r 779 494 796 518 0"), it would still
be pretty accessible, and you'd still be able to see the correspondence between
characters and the image, and you'd also be able to merge boxes by editing the
text.
Original comment by john.j.b...@gmail.com
on 17 Jan 2011 at 12:15
@john.j.b: I have this idea too. But it will cause other problems if you need
to delete box, combine too boxes, add box etc... And this approach will work
only in case that 1 box = 1 char (e.g. you can not train char combination e.g.
(fi) - see: http://groups.google.com/group/tesseract-ocr/msg/f835b55d9eaf7e9c.
Original comment by zde...@gmail.com
on 17 Jan 2011 at 8:24
Original issue reported on code.google.com by
sjuc...@gmail.com
on 29 Oct 2010 at 4:08