tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
626 stars 180 forks source link

Avoid unnecessary child processes (faster training) #72

Closed stweil closed 5 years ago

stweil commented 5 years ago

@wrznr, did you ever try Tesseract training with the GT4HistOCR ground truth (or know someone who did)?

wrznr commented 5 years ago

@stweil I am pretty sure @Doreenruirui did!

stweil commented 5 years ago

@jbaiter, dta19/1882-keller_sinngedicht/04970.nrm.png from GT4HistOCR is broken. convertcannot read it, and it looks like this in the browser.

Doreenruirui commented 5 years ago

@jbaiter, dta19/1882-keller_sinngedicht/04970.nrm.png from GT4HistOCR is broken. convertcannot read it, and it looks like this in the browser.

I have not encounter this problem before because I did not train it on the whole GT4HistOCR data.

wrznr commented 5 years ago

There is a way to pass seeds to sort -R:

seq 1 100 | sort -R --random-source=<(openssl enc -aes-256-ctr -pass pass:"42" -nosalt </dev/zero 2>/dev/null)

where 42 is the seed. This may make the additional script superfluous.

wrznr commented 5 years ago

@stweil I thought I saw a commit but now it's gone.

stweil commented 5 years ago

I accidentally pushed that commit here, but it is unrelated to this pull request, so I fixed my mistake and removed it again.

It is still there, see #71.

wrznr commented 5 years ago

It is still very useful! We are waiting for another PR ;)

lokesh-stack commented 5 years ago

can some one help me with this error:

lstmtraining \ --traineddata data/foo/foo.traineddata \ --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c" \ --model_output data/checkpoints/foo \ --learning_rate 20e-4 \ --train_listfile data/list.train \ --eval_listfile data/list.eval \ --max_iterations 10000 Warning: given outputs 0 not equal to unicharset of 64. Missing ] at end of [Series]! Failed to create network from spec: [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c** Makefile:163: recipe for target 'data/checkpoints/foo_checkpoint' failed

wrznr commented 5 years ago

@lokesh-stack Pls. do not file issues as comments in already merged PRs. File an issue at https://github.com/tesseract-ocr/tesstrain/issues instead. Pls. also remember to provide the necessary details for reproducing your error. Otherwise we cannot be of assistance.

stweil commented 5 years ago

@lokesh-stack, please update your local installation.