tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

make training not building traineddata file #381

Closed jimlaloi closed 3 months ago

jimlaloi commented 3 months ago

Hi, I'm on Windows with Tesseract version 5.3.3.20231005 and make version 4.4.1. I followed all the setup instructions from the tesstrain readme.

I've got ground truth data and I'm trying to train a model from scratch with it. When I run make training, everything seems to work correctly for several minutes: it makes the unicharset file, the box & lstmf files, the list.train file, and the list.eval file. But then make returns an Error 2 when trying to make the [lang].traineddata file.

Here is the output from the end of the training:

python shuffle.py 0 "data/fro_txtr/all-lstmf"
+ head -n 1465 data/fro_txtr/all-lstmf
+ tail -n 163 data/fro_txtr/all-lstmf
+ '[' Windows_NT = Windows_NT ']'
+ dos2unix data/fro_txtr/all-lstmf
dos2unix: converting file data/fro_txtr/all-lstmf to Unix format...
+ dos2unix data/fro_txtr/list.train
dos2unix: converting file data/fro_txtr/list.train to Unix format...
+ dos2unix data/fro_txtr/list.eval
dos2unix: converting file data/fro_txtr/list.eval to Unix format...
if [ "Windows_NT" = "Windows_NT" ]; then \
        dos2unix "data/fro_txtr/fro_txtr.numbers"; \
        dos2unix "data/fro_txtr/fro_txtr.punc"; \
        dos2unix "data/fro_txtr/fro_txtr.wordlist"; \
        dos2unix "data/langdata/fro_txtr/fro_txtr.config"; \
fi
dos2unix: data/fro_txtr/fro_txtr.numbers: No such file or directory
dos2unix: Skipping data/fro_txtr/fro_txtr.numbers, not a regular file.
dos2unix: data/fro_txtr/fro_txtr.punc: No such file or directory
dos2unix: Skipping data/fro_txtr/fro_txtr.punc, not a regular file.
dos2unix: data/fro_txtr/fro_txtr.wordlist: No such file or directory
dos2unix: Skipping data/fro_txtr/fro_txtr.wordlist, not a regular file.
dos2unix: data/langdata/fro_txtr/fro_txtr.config: No such file or directory
dos2unix: Skipping data/langdata/fro_txtr/fro_txtr.config, not a regular file.
make: *** [Makefile:293: data/fro_txtr/fro_txtr.traineddata] Error 2

I end up with a model directory that contains all-gt, all-lstmf, list.eval, list.train, and unicharset files, but no fro_txtr.traineddata file. I'm not sure if this is technically an issue with tesstrain, Tesseract, Make, or something else. But since it doesn't explain the error, I'm stuck on how to begin troubleshooting. Any help at all would be really appreciated.

zdenop commented 3 months ago

Seems like you are not using the latest training code. Please use tesseract user forum for support. Please do not forget to provide all steps and example data for replicating problem.