tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

Trying to train Tesseract for a different font, unable to get CER under 50% #368

Closed cvermani closed 4 months ago

cvermani commented 4 months ago

I have been trying to train tesseract so it can read the font on LED screen which have slightly different shaped characters. My current process was to 1)Install tesseract and make sure it was running. 2)clone tesstrain and add eng.traineddata from tessdata_best repo to the data folder. I used this guys code (https://www.youtube.com/watch?v=KE4xEzFGSU8) to generate the ground truth folder for all 195k line in eng.training_text. Copied the ground truth folder into tesstrain/data and ran make tesseract-langdata beforehand to have the langdata folder inside. After all this I used this command make training MODEL_NAME=abs START_MODEL=eng TESSDATA='tessdata path here' MAX_ITERATIONS=20000 Now I have done this more than once and never achieved error rate under 50%. I am not sure what I am doing wrong or if this error rate is normal. If anyone has any suggestions or if the post is missing something, please let me know.