Closed TheSYNcoder closed 4 years ago
data/TESS/TESS.traineddata is the starter traineddata created with the unicharset from training text. It's size should be small. It can't be used for recognition.
data/TESS.traineddata is the traineddata after training. If you didn't have wordlist, you will get a warning about missing dictionary.
Check the timestamps and file sizes. The larger and later file will be your traineddata file.
@TheSYNcoder
Although the WORDLIST_FILE/NUMBERS_FILE/PUNC_FILE are Optional in makefile, traineddata can also contain information on punctuation, word lists etc when training. If lack of these files ,the training traineddata will give this error when called.
3.1 Find the WORDLIST_FILE/NUMBERS_FILE/PUNC_FILE in the makefile, and change them to:
WORDLIST_FILE := data/$(MODEL_NAME).wordlist
NUMBERS_FILE := data/$(MODEL_NAME).numbers
PUNC_FILE := data/$(MODEL_NAME).punc
3.2 Suppose your base traineddata is eng.traineddata or your language is english. Download the .wordlist/.numbers/.punc files from the tesseract-ocr/langdata_lstm/eng, and Rename them as TESS.wordlist, TESS.numbers, TESS.punc, then place them to /data/.
3.3 make training again.
@Shreeshrii I think that there may be a bug about the WORDLIST_FILE/NUMBERS_FILE/PUNC_FILE in makefile.
In tesstrain, the default path of the above WORDLIST_FILE/NUMBERS_FILE/PUNC_FILE is $ (OUTPUT_DIR) = data / $ (MODEL_NAME), and all files in this path are automatically generated during the training process.
If the variable START_MODEL is not assigned, the makefile will not generate any related files under this path;
If the variable START_MODEL has been assigned, the foo.lstm-number-dawg、foo.lstm-punc-dawg、foo.lstm-word-dawg and so on will be produced in data / $ (MODEL_NAME). But they are not the right files the traineddata needed, the traineddata need the .wordlist/.numbers/.punc files. So there may be a bug in in tesstrain/makefile
Am I right Please?
@TheSYNcoder Please move TESS.traineddata
to /usr/local/share/tessdata/
(as indicated by the error message).
It is save to ignore the message Failed to load any lstm-specific dictionaries for lang TESS!!
, dictionaries are an optional addition to tesseract models. Personally, I never use them when training my own models. I do not see any benefits.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I have been training a sample model
TESS
usingtesstrain
and the training went fine . However after training when i move the/data/TESS/TESS.traineddata
to/usr/local/share
and runtesseract image.tif out -l TESS
I get the following errorOn the other hand , when i move the
/data/TESS.traineddata
it gives me the following error on running the same command :Am i doing something wrong after the training ,can anyone please help , if it may help , here's my tesseract version