tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
625 stars 180 forks source link

make training failing #146

Closed royudev closed 4 years ago

royudev commented 4 years ago

Hi i was trying to use make training but it keeps on failing

this is the output

+ tesseract data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001.tif data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001 --psm 6 lstm.train
Tesseract Open Source OCR Engine v5.0.0-alpha-635-g90405 with Leptonica
Page 1
Warning: Invalid resolution 0 dpi. Using 70 instead.
Failed to read boxes from data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001.tif
Error during processing.
Makefile:196: recipe for target 'data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001.lstmf' failed
make: *** [data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001.lstmf] Error 1

it keeps on failing saying that it failed to read boxed from the .tif file and makefile196: recipe fro target .lstmf failed

i'm using the images and txt file from the ocrd-testset.zip tesseract version is 5.0

is there something i'm doing wrong?

royudev commented 4 years ago

i encountered a new error

PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001.tif" -t "data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001.gt.txt" > "data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001.box"
+ tesseract data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001.tif data/foo-ground-truth/wackenroder_herzensergiessungen_1797_0051_001 --psm 6 lstm.train
Tesseract Open Source OCR Engine v5.0.0-alpha-635-g90405 with Leptonica
Page 1
Warning: Invalid resolution 0 dpi. Using 70 instead.
find data/foo-ground-truth -name '*.lstmf' | python3 shuffle.py 0 > "data/foo/all-lstmf"
Error: missing ground truth for training
Makefile:147: recipe for target 'data/foo/list.train' failed
make: *** [data/foo/list.train] Error 1

it says it is missing ground truth for training