tesseract-ocr / tessdata_best

Best (most accurate) trained LSTM models.
Apache License 2.0
1.23k stars 375 forks source link

can you recommend the best traineddata for numbers and latin letters #18

Closed JenyaKirmizaTripTop closed 1 year ago

JenyaKirmizaTripTop commented 6 years ago

I'm using tesseract for reading mrz codes and sometimes it gives me incorrect symbols eg. instead of "I" it gives me "1" or instead of "5" it gives "S"

Shreeshrii commented 6 years ago

Which version of tesseract and traineddata are you using?

JenyaKirmizaTripTop commented 6 years ago

I'm using tess2 and i want to select which traindata will fit bettter cause i need to read from MRZ passport code, and it has such symbols '<,>', numbers and latin letters I see some mistakes while reading <,> and also while reading latin letters. Sometimes it replaces the numbers with letters. For example 5 as S

Shreeshrii commented 6 years ago

You may get better response to such questions on the tesseract-ocr forum.

Or open an issue on for tesseract.

You can also search for mrz.traineddata, it won't be in official repo but user contribution

Shreeshrii commented 6 years ago

See https://github.com/tesseract-ocr/tesseract/wiki/AddOns