tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
625 stars 180 forks source link

best model is not generated after training #156

Closed saijaswanth433 closed 4 years ago

saijaswanth433 commented 4 years ago

i trained with 140k data keeping tsseract best model(15mb size) as base model but after training when i generate tessdata_best , the model is created with 4.1 size. why is this happening?

stweil commented 4 years ago

Newly trained models don't contain a dictionary and other parts from the original model. Those parts can be added by using combine_tessdata.

saijaswanth433 commented 4 years ago

Newly trained models don't contain a dictionary and other parts from the original model. Those parts can be added by using combine_tessdata.

can u please tell me how to use combine_tessdata and also will there be any increase in accuracy after i do it.

saijaswanth433 commented 4 years ago

:~/mp/tesstrain-master/data/foo$ combine_tessdata /home/$USER/temp/eng. Combining tessdata files Error: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file. Error combining tessdata files into /home/vishwam/temp/eng.traineddata Version string:4.1.0-rc1 23:version:size=9, offset=192

stweil commented 4 years ago

Please use the Tesseract user forum for all questions.

SpaceView commented 1 year ago

Dears, I'm using 5.0.0-alpha-20201224 and got the same problem when using combine_tessdata, ERROR: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file I searched and can find nowhere an appropriate solution, after study I get the following solution.

SOLUTION AS BELOW, after you have got all necessary materials from command, e.g.

cntraining mytest.normal.exp0.tr

you should have the following 5 files

inttemp
normproto
pffmtable
shapetable
unicharset

rename them to

normal.inttemp
normal.normproto
normal.pffmtable
normal.shapetable
normal.unicharset

and them use the "combine_tessdata normal" again, you will get the final traineddata

normal.traineddata
--->output as below
Combining tessdata files
Output normal.traineddata created successfully.
Version string:5.0.0-alpha-20201224
1:unicharset:size=662, offset=192
3:inttemp:size=132152, offset=854
4:pffmtable:size=103, offset=133006
5:normproto:size=1262, offset=133109
13:shapetable:size=166, offset=134371
23:version:size=20, offset=134537

Hope this helps.