tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
620 stars 181 forks source link

How to use the fine tune trained english ocr model? #192

Closed rambalachandran closed 3 years ago

rambalachandran commented 3 years ago

The README provides steps on how to train the model. For example, I have certain images and corresponding text that I used to finetune over the english protomodel. The training files were located in data/eng_mod1-grround-truth and the original english training data was placed in data/eng/eng.traineddata. Then I ran the following command

make training MODEL_NAME=eng_mod1 PROTO_MODEL=data/eng/eng.traineddata CORES=4 FINETUNE_TYPE=Impact

The model is now trained and I see new folder data/end_mod1 that also contains eng_mod1.traineddata.

Is it the correct way to finetune english model? If so, how to use this fine tuned model to predict on new images?

stweil commented 3 years ago

how to use this fine tuned model to predict on new images

Just copy or move it to your tessdata directory (that's where the already installed models like eng.traineddata are located). Then try tesseract --list-langs to see whether the new model is listed.