Open GoogleCodeExporter opened 9 years ago
I installed tesseract2.03 in Fedora-11 and sucessfully run tesseract
phototest.tif test
and the output "test.txt" was correct.
When I run "trainer_gui.py successfully tarined but output are blank similar
to one in
Ubuntu 9.04 which has already reported under issue 7.
Early guidance is requested. extract of terminal of Fedora-11 also attached for
investigation and needful action as deemed fit.
Original comment by withbles...@gmail.com
on 2 Dec 2009 at 5:27
Attachments:
Dear withblessings,
Kindly show me the output of the following commands in the
TesseractIndic-Trainer-GUI-0.1.1/ directory.
1) ls -al
2) ls -al hin.alphabet/
Original comment by debayanin
on 2 Dec 2009 at 6:26
Also, please try to run it as a normal user and root and then report what
output you get.
Original comment by debayanin
on 2 Dec 2009 at 6:29
As desired by you as normal user - vide screenshot 1.png and as root vide
screenshot3.png which are self explanatory.
Any more information is required?
unable to run reorder hin. py
Original comment by withbles...@gmail.com
on 3 Dec 2009 at 1:41
Attachments:
At this point of time, I think it might be a problem with absent fonts on your
system. I am looking into the problem.
Original comment by debayanin
on 4 Dec 2009 at 11:09
Kindly download the attached tar ball and proceed. Paste the output in the next
comment. I added a few print statements to diagnose the problem. From the look
of it,
the code is completely ignoring the alphabet files for some reason.
Original comment by debayanin
on 4 Dec 2009 at 11:44
Attachments:
Screenshots.png are attached which are self explanatory.
Screenshot.png was taken after clicked "train" first time and waited for few
minutes-
but freezes as could be seen cursor did not blink in terminal then only
Screenshot1.png was taken after clicked "train" for 2nd time.
screenshot2,png is output which is blank.question of paste does not arise.
screenshot3.png is hin.datafiles except hin.charcterset others hin.* are blank.
Original comment by withbles...@gmail.com
on 4 Dec 2009 at 2:41
Attachments:
From Screenshot-1.png I can see that the alphabets directory is not being
detected
properly. On the line next to "in file.py" in output, the directory is coming as
'/home/sangeetha/deepayan/test' whereas it should come out as
'/home/sangeetha/deepayan/test/hin.alphabet' . I am looking into the cause for
that
problem.
Original comment by debayanin
on 5 Dec 2009 at 9:13
Attached screenshot1.png. It is observed lohit hin as well as lohit.kn contains
square boxes instead actual fonts. Why it shows squreboxes I could not
understand.
Whether I have to install lohit.hin and lohit.kn to appear correctly?
Original comment by withbles...@gmail.com
on 5 Dec 2009 at 3:21
Attachments:
perhaps this may be cause to appear blank in output?
Original comment by withbles...@gmail.com
on 5 Dec 2009 at 3:22
Previous comments based on Ferdora-11
Just now I tried in Ubuntu9.04 - attached screenshot which is self explanatory.
In hin.datafiles are empty except hin.character contains only one entry.extract
of
hin.charset is reproduced below for your information:1
NULL 0 NULL
Original comment by withbles...@gmail.com
on 5 Dec 2009 at 4:23
Attachments:
Here is a solution for the time being: edit trainer_gui.py on line 136. Replace
the
line with the following line:
tesseract_trainer.generate.draw(font_string,15,self.language,tesseract_trainer.f
ile.read_file('/home/sangeetha/deepayan/test/hin.alphabet'),self.DirectoryOut)
I am assuming that you will have extracted the archive in
'/home/sangeetha/deepayan/test'.
If this works, it means that there is a bug in the Gtk code itself, since the
same
code gives different results on your and my machines. Once I confirm that i can
report a bug in the Gtk issue tracker.
Original comment by debayanin
on 6 Dec 2009 at 5:57
As per guidance, replaced the line 136 with
'/home/sangeetha/deepayan/test/hin.alphabet',self.DirectoryOut
and deleted existing line 136.
For result please see screenshot which are explanatory,
screenshot3 - using original trainer.
screenshot4 - using modified trainer vide comment6 -test.tar.gz.
Original comment by withbles...@gmail.com
on 6 Dec 2009 at 10:06
Attachments:
Simply replace your copy of trainer_gui.py with the attached file. Run the gui
and
kindly report back.
Original comment by debayanin
on 6 Dec 2009 at 3:49
Attachments:
Also, for the purpose of filing the gtk bug, kindly execute the following
command on
Ubuntu and paste me the output:
sudo dpkg -l | grep gtk
Original comment by debayanin
on 6 Dec 2009 at 4:39
Reg. comment 14=
replaced my copy of trainer_gui.py with downloaded file.report in the attached file.
it is noticed that no tif file was available in the Hindi Image folder before
running
dowloaded file.
It is presumed that trainer will generate tif file and store in Hindi Image
folder ?
Regarding comment 15 = executed the command sudo dpkg -l | grep gtk and its
output is
in attached file.
Awaiting further guidance.
Original comment by withbles...@gmail.com
on 6 Dec 2009 at 6:23
Attachments:
Kindly download version 0.1.2 of the trainer gui from the downloads page. I have
added some code that warns if the correct folder has not been selected. The
problem
was that in the 'alphabet folder selection' dialog you must either double click
on
the folder name and then press open or you should press enter on the folder
name and
then again press enter.
I looked at the gtk source code and found this text at the beginning of the
function
(gtk/gtkfilechooser.c line 724, ver 2.18.3):
"* Note that this is the folder that the file chooser is currently displaying
* (e.g. "/home/username/Documents"), which is <emphasis>not the same</emphasis>
* as the currently-selected folder if the chooser is in
* #GTK_FILE_CHOOSER_SELECT_FOLDER mode
* (e.g. "/home/username/Documents/selected-folder/". To get the
* currently-selected folder in that mode, use gtk_file_chooser_get_uri() as the
* usual way to get the selection."
Although the above text says that gtk_file_chooser_get_uri() shuld work for your
case, it does not give the expected output on my machine. Maybe there is a bug
in
this function code in gtk, and I will file a bug report.
For the time being, for your purpose, kindly download the new UI and follow
instructions.
Original comment by debayanin
on 6 Dec 2009 at 7:27
I have a small request. When you upload images, kindly change to some format
that
takes lesser space. You can downsample the image to reduce its size to some 100
kb or so.
Original comment by debayanin
on 6 Dec 2009 at 7:31
I have made some code changes in the 0.1.2 version that solves the problem
totally.
Now you can either simply select the folder or open it. Both will work. See the
changelog for details.
Original comment by debayanin
on 6 Dec 2009 at 8:08
Congratulations!! Suceeded.
Downloaded trainer GUI version 0.1.2 and run as usual. Successfully generated
hin.image, hin.box and hin.datafiles. hin.character file contains 16033 items.
Output file is blank. attached tesseract.log file which contains apply boxes
failure!
next step is hin.data files to be copied to tessdata folder of tesseract and
then run
tersseract bigimage.tif output -l hin ?
Valuable further guidance is awaited now.
If saved as png -664.9 KB / as jpeg -237.5 KB/ as tiff - 5.0MB. In future I
shall
save as jpeg format - Is it OK for the purpose of upload
Original comment by withbles...@gmail.com
on 7 Dec 2009 at 2:25
Attachments:
t is oberved that after moved all data files - cursor is stopped no blinking
etc. I
don't know about next movement of cursor after
moved all data files to hin.datafiles is expected. Further valuable guidance is
waiting now please.
Original comment by withbles...@gmail.com
on 7 Dec 2009 at 11:10
downloaded new version of trainer 0.1.3 - I find performance is similar to
earlier
version 0.1.2 - Here also cursor is stopped at the point "unicharset renamed and
moved" and also stopped the animation of logo.png. vide attached
screenshot1.pbm .
In tesseract.log contains similar to one attached under comment#20 above.
I also tested with Kannada alphabets also works well with minimum of apply boxes
failures. But when I tried to run normal text of kannada as "Rest" -failed with
error
message. Perphaps due to non-arranged in vertical form similar to hindi. I hope
trainer will support the full normal text of lang in future.
Original comment by withbles...@gmail.com
on 8 Dec 2009 at 1:43
Attachments:
Well, once the data files have been moved, the job is done. It waits for the
next
round of training and hence the cursor stops. It is normal operation and not a
problem.
As for your 2nd question, DO NOT put entire words in the 'rest' file. Simply put
characters of the kannada alphabet that do not fit into the other 2 files. For
example, digits, punctuations and independent vowels like a, aa ,ee etc.
Original comment by debayanin
on 8 Dec 2009 at 4:39
Since this discussion is turning out to be quite long, you could simply use the
indic-ocr mailing list to discuss this. indic-ocr@googlegroups.com . If you are
not
subscribed, join at http://groups.google.com/group/indic-ocr .
Original comment by debayanin
on 8 Dec 2009 at 4:41
Original comment by debayanin
on 8 Dec 2009 at 4:41
Original issue reported on code.google.com by
withbles...@gmail.com
on 2 Dec 2009 at 4:38Attachments: