Open T0biasCZe opened 5 months ago
What version of tesseract you use?
I get a slightly different output and no crash when I try this on Debian GNU Linux:
$ lstmtraining \
--debug_interval 0 \
--traineddata data/ocrd-testset/ocrd-testset.traineddata \
--old_traineddata ../tessdata_best/ces.traineddata \
--continue_from data/ces/ocrd-testset.lstm \
--learning_rate 0.0001 \
--model_output data/ocrd-testset/checkpoints/ocrd-testset \
--train_listfile data/ocrd-testset/list.train \
--eval_listfile data/ocrd-testset/list.eval \
--max_iterations 10000 \
--target_error_rate 0.01 \
2>&1 | tee -a data/ocrd-testset/training.log
Loaded file data/ces/ocrd-testset.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 122 to 131!
Num (Extended) outputs,weights in Series:
1,48,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
TxyLfys64:64, 20736
Lfx96:96, 61824
RxLrx96:96, 74112
Lfx384:384, 738816
Fc131:131, 50435
Total weights = 946083
Previous null char=121 mapped to 130
Continuing from data/ces/ocrd-testset.lstm
2 Percent improvement time=100, best error was 100 @ 0
At iteration 100/100/100, mean rms=2.136%, delta=7.610%, BCER train=27.051%, BWER train=59.946%, skip ratio=0.000%, New best BCER = 27.051 wrote best model:data/ocrd-testset/checkpoints/ocrd-testset_27.051_100_100.checkpoint wrote checkpoint.
2 Percent improvement time=100, best error was 27.051 @ 100
At iteration 200/200/200, mean rms=1.956%, delta=6.367%, BCER train=24.516%, BWER train=54.783%, skip ratio=0.000%, New best BCER = 24.516 wrote best model:data/ocrd-testset/checkpoints/ocrd-testset_24.516_200_200.checkpoint wrote checkpoint.
I tried the recent code and 5.4.0 and I am not able to reproduce it.
tesseract -v
tesseract 5.4.0
leptonica-1.84.2 (May 13 2024, 19:39:23) [MSC v.1929 LIB Release x64]
libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.1.90) : libpng 1.6.40 : libtiff 4.6.0 : zlib 1.2.13.zlib-ng : libwebp 1.3.2 : libopenjp2 2.5.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 200203
I have ICU version 74.2.
Had the same problem.
It's a windows issue. You need to specify the TESSDATA path using forward slashes
so for the op,
C:\Users\tobik\source\repos\tesstrain>make training MODEL_NAME=ocrd-testset START_MODEL=ces TESSDATA=C:/tessdata
rather than
TESSDATA=C:\tessdata
When trying to fine tune model, i get Failed to read data errors and then assert failed error