tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
630 stars 184 forks source link

Finetuning performs worse in some cases #263

Closed soufieneghribi closed 3 years ago

soufieneghribi commented 3 years ago

Thanks, that solved a problem.

Another question: I finetuned the model on 96 images. I started from a model that I finetuned on a specific font (START_MODEL=font_model).

I got very good results on images that are similar to the training set (error rate 0.06) . But the performance of the model over other images decreased.

I trained model on images like these:
image

The model performed well over these images.

An example of a prediction that got worse:

image

Finetuned model | Font model (START_MODEL) -- | -- عدالد | عد اله -- | --
wrznr commented 3 years ago

Hard to judge really. Could be a problem of overfit. I.e. you (re-)train the model on very specific materials and in doing so the model forgets some the things it knew before. Do not forget that training means adjusting probability distributions if you train it to “hard” on a narrow domain, the probability distribution may become narrow as well.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.