Closed FabioLugli closed 4 years ago
I have faced this problem and do not know what causes it. Probably too many files of gt.txt.
My workaround has been to copy my training text as "data/foo/all-gt" before running make training
.
example:
cd data/$MODEL
for f in $SCRIPTPATH/OCR_GS_Data/ara/book_IbnFaqihHamadhani.Buldan/*.gt.txt; do (cat "${f}"; echo) >> all-gt; done
cat /home/ubuntu/langdata_save_lstm/ara/ara.minusnew.training_text >> all-gt
cd ../..
nohup make training \
MODEL_NAME=$MODEL \
LANG_TYPE=RTL \
BUILD_TYPE=Minus \
TESSDATA=/home/ubuntu/tessdata_best \
GROUND_TRUTH_DIR=$SCRIPTPATH/OCR_GS_Data/ara \
START_MODEL=script/Arabic \
RATIO_TRAIN=0.99 \
DEBUG_INTERVAL=-1 \
MAX_ITERATIONS=200000 > $MODEL.log &
I also create the all-lstmf
outside of makefile process.
Thanks for the quick response, i'll try it immediatly.
@FabioLugli Did it work?
I tried unsuccessfully to follow the procedure of Shreeshrii; looking on how i made the all-gt file i found that at the and of each line there was a CRLF (Windows format) instead of only LF (Linux format). Changing that the procedure went on correctly.
I'm using Ubuntu 16.04 on a WSL on windows. I have correctly installed tesseract and leptonica, but when i use the command:
sudo make training
the terminal stays frozen on the phrase:unicharset_extractor --output_unicharset "data/foo/unicharset" --norm_mode 2 "data/foo/all-gt"
From here it happens nothing, i have to stop the process. Other commands like:
sudo make lists
works instead. What could be the problem?