jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Warning no protos/configs for <something> in CreateIntTemplates() when use command mftraining #557

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.tesseract avaya.avaya.exp0.tif avaya.avaya.exp0 nobatch box.train
2.unicharset_extractor avaya.avaya.exp0.box
3.mftraining -F font_properties -U unicharset -O avaya.unicharset 
avaya.avaya.exp0.tr

My expected output is :

I see instead :

Warning: no protos/configs for ' in CreateIntTemplates()
Warning: no protos/configs for : in CreateIntTemplates()
Error: no configs for class ' in mftraining
Error: no configs for class : in mftraining

What version of the product are you using? On what operating system?
tesseract 3.00
OS : Windows XP

Please provide any additional information below :
In attachments

Original issue reported on code.google.com by smr.meor...@gmail.com on 7 Oct 2011 at 10:47

Attachments:

GoogleCodeExporter commented 9 years ago
I have the same issue with Tesseract 3.02

Original comment by andy.bia...@gmail.com on 6 Apr 2012 at 12:25

GoogleCodeExporter commented 9 years ago
Problem is that you do not follow instruction: 
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training
_Images :
* Make sure there are a minimum number of samples of each character. 10 is 
good, but 5 is OK for rare characters.
* There should be more samples of the more frequent characters - at least 20.
* Make the text more realistic.

My experience: if I see "no protos/configs for xyz" - it means there is not 
enough examples of xyz in input image/box.

Also it is not suggested to use images with DPI below 200 (avaya.avaya.exp0.tif 
is 96 DPI)... So I suggest to fix input image.

Original comment by zde...@gmail.com on 6 Apr 2012 at 9:37

GoogleCodeExporter commented 9 years ago
I'm trying to train the licence plate number of our country. I've got the 
official font of all the characters. I also have a problem stated like Issue 
557. But is it truly necessary and reasonable to "Make sure there are a minimum 
number of samples of each character"? In my case I just have 3 tif files at 
hand and that's all I need. I'm frustrated at training such a tiny language.
See the accessary for my 3 tif files and their box files I made.
Thanks.

Original comment by xyxzfj@gmail.com on 16 May 2012 at 2:01

Attachments:

GoogleCodeExporter commented 9 years ago
Oh my! I've finally got it out!
I didn't know why smr(comment 1)'s and also mine previous problem
"Warning: no protos/configs for ' in CreateIntTemplates()
Warning: no protos/configs for : in CreateIntTemplates()
Error: no configs for class ' in mftraining
Error: no configs for class : in mftraining"
occured, and I doubted the zde(comment 2)'s reason in comment 3.

In my previous trial, I used 3 speperated tif files for digits, alphas and 
chineses characters. And I didn't follow the [lang].[fontname].exp[num] rule 
since I thought the bracked parts are optional; I was using cnlp.exp0, 
cnlp.exp1, cnlp.exp2. And the problem like comment 1 occured.

Now, I merged the 3 tif files into one, cnlp.lpft.exp10.tif. And do the 
following:
Make Box Files:
tesseract cnlp.lpft.exp09.tif cnlp.lpft.exp09 batch.nochop makebox
(Here I used exp09.tif instead of exp10.tif, is in order to avoid getting a bad 
box file that takes part of some of my characters as a independent character 
since lots of my characters and made up of isolated radicals like 
艹、亠、一)

Run Tesseract for Training:
tesseract cnlp.lpft.exp10.tif cnlp.lpft.exp10 nobatch box.train

Compute the Character Set:
unicharset_extractor cnlp.lpft.exp10.box

font_properties:(content of the file "font_properties": lpft 0 0 1 0 0)
mftraining -F font_properties -U unicharset cnlp.lpft.exp10.tr

Clustering:
mftraining -F font_properties -U unicharset -O cnlp.unicharset 
cnlp.lpft.exp10.tr
cntraining cnlp.lpft.exp10.tr

Clustering:(empty)

The last file (unicharambigs):(none)

Putting it all together:(I've added prefix "cnl." to normproto, Microfeat, 
inttemp, pffmtable and unicharset)
combine_tessdata cnl.

TEST:
I used the cnl.traineddata to test my cnlp.lpft.exp10.tif:
tesseract cnlp.lpft.exp10.tif cnlp.txt -l cnl
RESULT:
Tesseract Open Source OCR Engine v3.01 with Leptonica
TIFFReadDirectory: Warning, TIFFstream: invalid TIFF directory; tags are not sor
ted in ascending order.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 20624 (0x5090) en
countered.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 20625 (0x5091) en
countered.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 40092 (0x9c9c) en
countered.
TIFFReadDirectory: Warning, TIFFstream: invalid TIFF directory; tags are not sor
ted in ascending order.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 20624 (0x5090) en
countered.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 20625 (0x5091) en
countered.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 40092 (0x9c9c) en
countered.
TIFFReadDirectory: Warning, TIFFstream: invalid TIFF directory; tags are not sor
ted in ascending order.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 20624 (0x5090) en
countered.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 20625 (0x5091) en
countered.
TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 40092 (0x9c9c) en
countered.
Page 0
CNLP.TXT:
12345
67890
ABCD
HIJK
OPQR
SLE
TMF
UNG
VWXYZ
京津冀晋蒙辽吉黑沪
苏浙皖闽赣鲁豫鄂湘
粤桂琼渝川贵云藏陕
甘青宁新港澳使领学

The result is good enough for me!
Thank you all!

Original comment by xyxzfj@gmail.com on 17 May 2012 at 1:44

Attachments:

GoogleCodeExporter commented 9 years ago
I've spelled something wrong.
In comment 4, "
Clustering:(empty)

The last file (unicharambigs):(none)
" Should be changed into:"

Dictionary Data (Optional):(none)

The last file (unicharambigs):(none)
".

Original comment by xyxzfj@gmail.com on 17 May 2012 at 2:22

GoogleCodeExporter commented 9 years ago
I'll also have my trained data attached in case some one needs!

Original comment by xyxzfj@gmail.com on 20 May 2012 at 10:44

Attachments:

GoogleCodeExporter commented 9 years ago
Hi... im from Mexico and i have the same problem...
I follow all step by step but i continue having problems...
if some one can help me please.... 

Original comment by ing.raid...@gmail.com on 24 May 2012 at 5:13

Attachments:

GoogleCodeExporter commented 9 years ago
And when i wanto to use cntrainig the app crash and i dont know if the precces 
have finished...

Original comment by ing.raid...@gmail.com on 24 May 2012 at 5:18

Attachments:

GoogleCodeExporter commented 9 years ago
@ing.raidel.herreraycairo:
1. you are not following instruction (see comment #2) - so your problems are 
just your problems
2. you are not providing details (tesseract version, used commands)
3. it looks like you do not read to instructions carefully:
  a) proper command is "mftraining -F font_properties -U unicharset -O mat.unicharset mat.placas.exp0.tr" and I see something else on screenshot
  b) your font_properties has BOM and it is problem...
4. cntraining will not work if mftraining did not worked...

Original comment by zde...@gmail.com on 24 May 2012 at 8:23

GoogleCodeExporter commented 9 years ago
Ok thanx i did it and it works well... Excuse me for my bad english... 
I only rename the unicharset file to mat.unicharset where mat its my -language.
However thank you very much...

Original comment by ing.raid...@gmail.com on 27 May 2012 at 2:07

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by zde...@gmail.com on 21 Jul 2012 at 3:31