choosing tessdata to get more accuracy

To use a different model, specify extra_cmdline_params="-l osd" (assuming osd.traineddata is the new model you created).

As for improving the accuracy - besides trying to train a dedicated tesseract model (although, I must admit, I do not know of examples where one managed to obtain statistically significant benefits with custom models), perhaps you could make sure the input images are as clear as possible.

One common issue, that is handled very poorly by the current implementation has to do with the situation, where the document lies on some kind of a patterned background (e.g. a table).

You can try running the mrz script with the --save_roi parameter on the badly recognized examples and examine the regions extracted by the pipeline. If the region is correct (i.e. includes the actual MRZ in correct orientation), tuning tesseract is the way to go. If the region is usually incorrect, then the problem lies in the image preprocessing.

If you discover an useful way to process images which you think should be added to the current PassportEye pipeline, let me know!

konstantint / PassportEye

choosing tessdata to get more accuracy #46