Closed ccampisano closed 1 year ago
Having the same issue
Please provide the test case (all files) to reproduce the problem.
@zdenop here's the training material
thx and regards, corrado cdi-ground-truth.zip
Please post also each steps (commands you run) what you did for reproducing problem.
the only command I ran was "_make training MODELNAME=cdi"
Having exactly the same issue here since reinstalling tesseract, despite lstm.train
being in tessdata_dir/configs
make training MODEL_NAME=test_trained START_MODEL=grc OUTPUT_DIR=/scratch/sven/ocr_exp/models/test/train GROUND_TRUTH_DIR=/scratch/sven/ocr_exp/datasets/test CORES=12 EPOCHS=1
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75 --psm 13 lstm.train
read_params_file: Can't open lstm.train
python3 shuffle.py 0 "/scratch/sven/ocr_exp/models/test/train/all-lstmf"
/bin/bash: line 1: bc: command not found
/bin/bash: line 4: bc: command not found
+ head -n '' /scratch/sven/ocr_exp/models/test/train/all-lstmf
head: invalid number of lines: ''
+ tail -n '' /scratch/sven/ocr_exp/models/test/train/all-lstmf
tail: invalid number of lines: ''
make: *** [Makefile:191: /scratch/sven/ocr_exp/models/test/train/list.train] Error 1
read_params_file: Can't open lstm.train
indicates that there is a problem with the tesseract installation. How did you install tesseract?
bc: command not found
indicated that bc utility is not in the path.
read_params_file: Can't open lstm.train
indicates that there is a problem with the tesseract installation. How did you install tesseract?
bc: command not found
indicated that bc utility is not in the path.
I installed tesseract from the git repo, doing configure, make, etc.
How should I install it?
BTW: "bc" was installed (Already to the newest version 1.07.1-2+b2)
@ccampisano 'bc' is issue of @sven-nm who think is has the same problem as you... please post installation log of tesseract.
@zdenop I didn't record the installation log, but it went fine. I'll redo and report here asap.
See simular issue https://github.com/tesseract-ocr/tesstrain/issues/325 - please try clean installation (uninstall everything and install from scratch). First try sample data and if it works, try your data...
@zdenop please find attached installation logs, I followed instructions in the repo's readme.
Notice I had a problem during configure and had to run it with --disable-dependency-tracking
Please let me know what to do next, my aim is to be able to create custom traindata.
Can you please post output of following commands?
echo $TESSDATA_PREFIX
and
tesseract a b -l c
@zdenop here's the results:
corrado@tesseract:~$ echo $TESSDATA_PREFIX
corrado@tesseract:~$ tesseract a b -l c
Error opening data file /usr/local/share/tessdata/c.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'c'
Tesseract couldn't load any languages!
Could not initialize tesseract.
According data you posted you installed tesseract
to /usr/local/bin
, and tesseract search for its data in subdirectories of /usr/local/share/tessdata/
, (lstm.train
is installed to /usr/local/share/tessdata/configs
)... So tesseract is installed correctly .
Can you please double check if there is no other tesseract instalation (e.g. in /usr/bin
)?
Can you now run make training MODEL_NAME=cdi
?
@zdenop there is no other tesseract installation:
corrado@tesseract:~$ ls /usr/bin/ | grep tess
corrado@tesseract:~$ which tesseract
/usr/local/bin/tesseract
root@tesseract:~# apt remove tesseract-ocr
Lettura elenco dei pacchetti... Fatto
Generazione albero delle dipendenze... Fatto
Lettura informazioni sullo stato... Fatto
Il pacchetto "tesseract-ocr" non è installato e quindi non è stato rimosso
0 aggiornati, 0 installati, 0 da rimuovere e 0 non aggiornati.
BTW:
1) I didn't run make training
and sudo make training-install
yet, should I? (see here)
2) should I run make training MODEL_NAME=cdi
from the tesseract folder where I worked so far, or in the tesstrain folder?
3) where to put the training data folder?
thanks corrado
Yes, please run sudo make training-install
first.
Maybe please first run training on example data (see e.g. this tutorial - just skip installing tesseract as you already did it manually... )
Also you need to install eng.traineddata and osd.traineddata (make tesseract-langs
in tesstrain
- see README.)
@zdenop thanks for your support, I was able to run the traininig correctly (and didn't need osd.traineddata).
The trained file was correctly generated, but:
how could I improve this?
Congratulation!
'its performances are very poor, compared to the regular "ita" file'
It is in line with documentation. Did you read it? Or did you expected that with 10 minutes training you will get better result than Google with its resources?
Hi, I was able to build tesseract from git and run tesstrain script, but the latter failed this way:
any hints?
thx and rgrds, corrado