tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
637 stars 188 forks source link

newly trained tesseract model not working #54

Closed salimchicku closed 5 years ago

salimchicku commented 5 years ago

We have trained tesseract with custom data having 2000 images for 10k iteration.The size of the trained file (digitsmodel.traineddata) is very less (5.1 KB). When we are testing the newly trained model,we are getting the following error

raise TesseractError(status_code, get_errors(error_string)) pytesseract.pytesseract.TesseractError: (1, "Error: LSTM requested, but not present!! Loading tesseract. Failed loading language 'digitsmodel' Tesseract couldn't load any languages! Could not initialize tesseract.")

vijayrajasekaran commented 5 years ago

Use the generated trained data from the data directory data/digitsmodel.traineddata

wrznr commented 5 years ago

@vijayrajasekaran Thank you very much for your hint! @kiransab Still an issue?

canyilmaz90 commented 5 years ago

@wrznr @kiransab I have the same problem, did you manage to find what the solution is? At the end of the training, output was something like that:

2 Percent improvement time=975, best error was 3.143 @ 5376
At iteration 6351/10000/10000, Mean rms=0.492%, delta=0.284%, char train=1.013%, word train=3.54%, skip ratio=0%,  New best char error = 1.013 wrote best model:data/checkpoints/pts1.013_6351.checkpoint wrote checkpoint.

Finished! Error rate = 1.013
lstmtraining \
--stop_training \
--continue_from data/checkpoints/pts_checkpoint \
--traineddata data/pts/pts.traineddata \
--model_output data/pts.traineddata
Loaded file data/checkpoints/pts_checkpoint, unpacking...

I have tried both data/pts.traineddata and data/pts/pts.traineddata, but got the same output:

TesseractError: (1, "Failed to load any lstm-specific dictionaries for lang pts!! Failed loading language 'pts' Tesseract couldn't load any languages! Could not initialize tesseract.")

I'm using it with pytesseract and other .traineddata files work well for the same code.

canyilmaz90 commented 5 years ago

Mates, I found what the issue is for me. I was using tesseract in "--oem 2 (legacy+lstm)" mode, but realized that the code in this repo cannot produce ".box" files (it produces, but only emty files), so it doesn't work for legacy mode. It can work only in "--oem 1 (only lstm)" mode.

wrznr commented 5 years ago

Many thanks for your helpful solution!