tk120404 / tesseractindic

Automatically exported from code.google.com/p/tesseractindic
Other
0 stars 0 forks source link

output are blank for hindi sample test #7

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?
output should be identical with sample provided tesseractindic-trainer
but blank output and in hin.charset- only one item appeared.
What version of the product are you using? On what operating system?
tesseractindic-Trainer-GUI 0.1.1  Ubuntu 9.04

Please provide any additional information below.
Screeenshot.png attached which is self explanatory.
output is blank/Hindi images
folder is blank In tessdata folder it is observed only one item is figured
as follow in hin.unicharset:
1
NULL 0 NULL

Original issue reported on code.google.com by withbles...@gmail.com on 2 Dec 2009 at 4:38

Attachments:

GoogleCodeExporter commented 9 years ago
I installed tesseract2.03 in Fedora-11 and sucessfully run tesseract 
phototest.tif test
and the output "test.txt" was correct.
When I run "trainer_gui.py  successfully tarined but output are blank similar 
to one in
Ubuntu 9.04 which has already reported under issue 7.
Early guidance is requested.  extract of terminal of Fedora-11 also attached for
investigation and needful action as deemed fit.

Original comment by withbles...@gmail.com on 2 Dec 2009 at 5:27

Attachments:

GoogleCodeExporter commented 9 years ago
Dear withblessings,

Kindly show me the output of the following commands in the
TesseractIndic-Trainer-GUI-0.1.1/ directory.

1) ls -al
2) ls -al hin.alphabet/

Original comment by debayanin on 2 Dec 2009 at 6:26

GoogleCodeExporter commented 9 years ago
Also, please try to run it as a normal user and root and then report what 
output you get.

Original comment by debayanin on 2 Dec 2009 at 6:29

GoogleCodeExporter commented 9 years ago
As desired by you as normal user - vide screenshot 1.png and as root vide
screenshot3.png which are self explanatory.
Any more information is required?
unable to run reorder hin. py 

Original comment by withbles...@gmail.com on 3 Dec 2009 at 1:41

Attachments:

GoogleCodeExporter commented 9 years ago
At this point of time, I think it might be a problem with absent fonts on your
system. I am looking into the problem.

Original comment by debayanin on 4 Dec 2009 at 11:09

GoogleCodeExporter commented 9 years ago
Kindly download the attached tar ball and proceed. Paste the output in the next
comment. I added a few print statements to diagnose the problem. From the look 
of it,
the code is completely ignoring the alphabet files for some reason.

Original comment by debayanin on 4 Dec 2009 at 11:44

Attachments:

GoogleCodeExporter commented 9 years ago
Screenshots.png are attached which are self explanatory.

Screenshot.png was taken after clicked "train" first time and waited for few 
minutes-
but freezes as could be seen cursor did not blink in terminal then only
Screenshot1.png  was taken after clicked "train" for 2nd time.
screenshot2,png is output which is blank.question of paste does not arise.
screenshot3.png is hin.datafiles except hin.charcterset others hin.* are blank.

Original comment by withbles...@gmail.com on 4 Dec 2009 at 2:41

Attachments:

GoogleCodeExporter commented 9 years ago
From Screenshot-1.png  I can see that the alphabets directory is not being 
detected
properly. On the line next to "in file.py" in output, the directory is coming as
'/home/sangeetha/deepayan/test' whereas it should come out as
'/home/sangeetha/deepayan/test/hin.alphabet' . I am looking into the cause for 
that
problem.

Original comment by debayanin on 5 Dec 2009 at 9:13

GoogleCodeExporter commented 9 years ago
Attached screenshot1.png. It is observed lohit hin as well as lohit.kn contains
square boxes  instead actual fonts. Why it shows squreboxes I could not 
understand.
Whether I have to install lohit.hin and lohit.kn to appear correctly?

Original comment by withbles...@gmail.com on 5 Dec 2009 at 3:21

Attachments:

GoogleCodeExporter commented 9 years ago
 perhaps this may be cause to appear blank in output?

Original comment by withbles...@gmail.com on 5 Dec 2009 at 3:22

GoogleCodeExporter commented 9 years ago
Previous comments based on Ferdora-11 
Just now I tried in Ubuntu9.04 - attached screenshot which is self explanatory.
In hin.datafiles are empty except hin.character contains only one entry.extract 
of
hin.charset is reproduced below for your information:1
NULL 0 NULL

Original comment by withbles...@gmail.com on 5 Dec 2009 at 4:23

Attachments:

GoogleCodeExporter commented 9 years ago
Here is a solution for the time being: edit trainer_gui.py on line 136. Replace 
the
line with the following line:

tesseract_trainer.generate.draw(font_string,15,self.language,tesseract_trainer.f
ile.read_file('/home/sangeetha/deepayan/test/hin.alphabet'),self.DirectoryOut)

I am assuming that you will have extracted the archive in
'/home/sangeetha/deepayan/test'.
If this works, it means that there is a bug in the Gtk code itself, since the 
same
code gives different results on your and my machines. Once I confirm that i can
report a bug in the Gtk issue tracker.

Original comment by debayanin on 6 Dec 2009 at 5:57

GoogleCodeExporter commented 9 years ago
As per guidance, replaced the line 136 with
'/home/sangeetha/deepayan/test/hin.alphabet',self.DirectoryOut
and deleted existing line 136.
For result please see screenshot which are explanatory,
screenshot3 - using original trainer.
screenshot4 - using modified trainer vide comment6 -test.tar.gz.

Original comment by withbles...@gmail.com on 6 Dec 2009 at 10:06

Attachments:

GoogleCodeExporter commented 9 years ago
Simply replace your copy of trainer_gui.py with the attached file. Run the gui 
and
kindly report back.

Original comment by debayanin on 6 Dec 2009 at 3:49

Attachments:

GoogleCodeExporter commented 9 years ago
Also, for the purpose of filing the gtk bug, kindly execute the following 
command on
Ubuntu and paste me the output:  

sudo dpkg -l | grep gtk

Original comment by debayanin on 6 Dec 2009 at 4:39

GoogleCodeExporter commented 9 years ago
Reg. comment 14=
 replaced my copy of trainer_gui.py with downloaded file.report in the attached file.
it is noticed that no tif file was available in the Hindi Image folder before 
running
dowloaded file.
It is presumed that trainer will generate tif file and store in Hindi Image 
folder ?

Regarding comment 15 = executed the command sudo dpkg -l | grep gtk and its 
output is
in attached file.
Awaiting further guidance.

Original comment by withbles...@gmail.com on 6 Dec 2009 at 6:23

Attachments:

GoogleCodeExporter commented 9 years ago
Kindly download version 0.1.2 of the trainer gui from the downloads page. I have
added some code that warns if the correct folder has not been selected. The 
problem
was that in the 'alphabet folder selection' dialog you must either double click 
on
the folder name and then press open or you should press enter on the folder 
name and
then again press enter.
I looked at the gtk source code and found this text at the beginning of the 
function
(gtk/gtkfilechooser.c line 724, ver 2.18.3):

"* Note that this is the folder that the file chooser is currently displaying
 * (e.g. "/home/username/Documents"), which is <emphasis>not the same</emphasis>
 * as the currently-selected folder if the chooser is in
 * #GTK_FILE_CHOOSER_SELECT_FOLDER mode
 * (e.g. "/home/username/Documents/selected-folder/".  To get the
 * currently-selected folder in that mode, use gtk_file_chooser_get_uri() as the
 * usual way to get the selection."

Although the above text says that gtk_file_chooser_get_uri() shuld work for your
case, it does not give the expected output on my machine. Maybe there is a bug 
in
this function code in gtk, and I will file a bug report.
For the time being, for your purpose, kindly download the new UI and follow 
instructions.

Original comment by debayanin on 6 Dec 2009 at 7:27

GoogleCodeExporter commented 9 years ago
I have a small request. When you upload images, kindly change to some format 
that
takes lesser space. You can downsample the image to reduce its size to some 100 
kb or so.

Original comment by debayanin on 6 Dec 2009 at 7:31

GoogleCodeExporter commented 9 years ago
I have made some code changes in the 0.1.2 version that solves the problem 
totally.
Now you can either simply select the folder or open it. Both will work. See the
changelog for details.

Original comment by debayanin on 6 Dec 2009 at 8:08

GoogleCodeExporter commented 9 years ago
Congratulations!! Suceeded.
Downloaded trainer GUI version 0.1.2 and run as usual. Successfully generated
hin.image, hin.box and hin.datafiles. hin.character file contains 16033 items.
Output file is blank. attached tesseract.log file which contains apply boxes 
failure!
next step is hin.data files to be copied to tessdata folder of tesseract and 
then run
tersseract bigimage.tif output -l hin ?
Valuable further guidance is awaited now.
If saved as png -664.9 KB / as jpeg -237.5 KB/ as tiff - 5.0MB. In future I 
shall
save as jpeg format - Is it OK for the purpose of upload 

Original comment by withbles...@gmail.com on 7 Dec 2009 at 2:25

Attachments:

GoogleCodeExporter commented 9 years ago
t is oberved that after moved all data files - cursor is stopped no blinking 
etc. I
don't know about next movement of cursor after
moved all data files to hin.datafiles is expected.  Further valuable guidance is
waiting now please.

Original comment by withbles...@gmail.com on 7 Dec 2009 at 11:10

GoogleCodeExporter commented 9 years ago
downloaded new version of trainer 0.1.3 - I find performance is similar to 
earlier 
version 0.1.2 - Here also cursor is stopped at the point "unicharset renamed and
moved" and also stopped the animation of logo.png. vide attached 
screenshot1.pbm .

In tesseract.log contains similar to one attached under comment#20 above.

I also tested with Kannada alphabets also works well with minimum of apply boxes
failures. But when I tried to run normal text of kannada as "Rest" -failed with 
error
message. Perphaps due to non-arranged in vertical form similar to hindi. I hope
trainer will support the full normal text of lang in future.

Original comment by withbles...@gmail.com on 8 Dec 2009 at 1:43

Attachments:

GoogleCodeExporter commented 9 years ago
Well, once the data files have been moved, the job is done. It waits for the 
next
round of training and hence the cursor stops. It is normal operation and not a 
problem.
As for your 2nd question, DO NOT put entire words in the 'rest' file. Simply put
characters of the kannada alphabet that do not fit into the other 2 files. For
example, digits, punctuations and independent vowels like a, aa ,ee etc.

Original comment by debayanin on 8 Dec 2009 at 4:39

GoogleCodeExporter commented 9 years ago
Since this discussion is turning out to be quite long, you could simply use the
indic-ocr mailing list to discuss this. indic-ocr@googlegroups.com . If you are 
not
subscribed, join at http://groups.google.com/group/indic-ocr .

Original comment by debayanin on 8 Dec 2009 at 4:41

GoogleCodeExporter commented 9 years ago

Original comment by debayanin on 8 Dec 2009 at 4:41