Open Bunny22222 opened 5 years ago
training/combine_tessdata -e ../tessdata/chi_sim.traineddata ../tesstutorial/chi_simeval/chi_sim.
This command is not complete.
When you want to extract a component you have to give its name. The following works:
combine_tessdata -e ../tessdata_best/chi_sim.traineddata ../tesstutorial/chi_sim.lstm
Extracting tessdata components from ../tessdata_best/chi_sim.traineddata
Wrote ../tesstutorial/chi_sim.lstm
Version string:4.00.00alpha:chi_sim:synth20170629:[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
0:config:size=1966, offset=192
17:lstm:size=12152851, offset=2158
18:lstm-punc-dawg:size=282, offset=12155009
19:lstm-word-dawg:size=590634, offset=12155291
20:lstm-number-dawg:size=82, offset=12745925
21:lstm-unicharset:size=258834, offset=12746007
22:lstm-recoder:size=72494, offset=13004841
23:version:size=84, offset=13077335
If you want to unpack the whole traineddata, then the name is not required.
combine_tessdata -u ../tessdata_best/chi_sim.traineddata ../tesstutorial/chi_sim.
Extracting tessdata components from ../tessdata_best/chi_sim.traineddata
Wrote ../tesstutorial/chi_sim.config
Wrote ../tesstutorial/chi_sim.lstm
Wrote ../tesstutorial/chi_sim.lstm-punc-dawg
Wrote ../tesstutorial/chi_sim.lstm-word-dawg
Wrote ../tesstutorial/chi_sim.lstm-number-dawg
Wrote ../tesstutorial/chi_sim.lstm-unicharset
Wrote ../tesstutorial/chi_sim.lstm-recoder
Wrote ../tesstutorial/chi_sim.version
Version string:4.00.00alpha:chi_sim:synth20170629:[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
0:config:size=1966, offset=192
17:lstm:size=12152851, offset=2158
18:lstm-punc-dawg:size=282, offset=12155009
19:lstm-word-dawg:size=590634, offset=12155291
20:lstm-number-dawg:size=82, offset=12745925
21:lstm-unicharset:size=258834, offset=12746007
22:lstm-recoder:size=72494, offset=13004841
23:version:size=84, offset=13077335
The error message: tesseract::TessdataManager::TessdataTypeFromFileName(filename, &type):Error:Assert failed:in file tessdatamanager.cpp, line 298
@stweil Regarding your suggestion about more meaningful logging and error codes, this Assert could be changed to a more descriptive error message.
When I using
In basetrain.log file shows error messages:Can't encode transcription: '棠会泞 诫蝣腹 伛铼虢 变绯甚 黛绑茔 凇粑嗉 洳钓廨 勃荩掰 崾丹钠 拽古仙 敬崛蒉 宠广牦 殂楦种 耱鲆憧 媛嵌陵 莴横贴' in language ' '
when I using
The error message: tesseract::TessdataManager::TessdataTypeFromFileName(filename, &type):Error:Assert failed:in file tessdatamanager.cpp, line 298