Closed henrytom1703 closed 5 years ago
I am facing this same issue as well. It appears to be affecting the 1.3.0 version. 1.2.4 works fine.
Yes, it stems from yesterday's 1.3.0 update.
In 1.3.0 I hard-coded Tesseract to use the "legacy" recognizer (rather than the LSTM one), because it seems to work better in pretty much all the cases I've tested it. However, it seems that some installations of Tesseract v4 do not come with language files appropriate for the legacy version.
So, in theory, your problem would resolve itself if you downloaded eng.traindata
from here and put it into Tesseract's tessdata
directory instead of whatever you currently have there (most probably you have eng.traindata
which is around 5MB in size, while the "legacy+new" eng.traindata
is around 30MB).
Another quick hack is to pass extra_cmdline_params='--oem 3'
(meaning "use LSTM engine, when possible, or Legacy otherwise) to read_mrz
.
Given the circumstances, perhaps forcing the use of legacy was not a very user-friendly decision for PassportEye
. It is not clear which would be a better resolution for this:
--oem 3
extra cmdline param. Better quality out of the box for those who have legacy data files or read the docs. Ugly user experience for those who don't.--oem 0
as an extra cmdline arg. Better user experience for everyone. Worse quality out of the box (because all new tesseracts use a "newer" model by default).I'll probably go the 2nd way for now (that'll be 1.4.0 then) but opinions are welcome.
I removed the forced use of the legacy recognizer in 1.4.0, however if you want noticeably better results, I highly recommend you install the legacy "traineddata" files and use --legacy
flag with the mrz
script or, correspondingly extra_cmdline_params='--oem 0'
with the read_mrz
function.
code:
Error