Open TrueWodzu opened 3 years ago
@bertsky, maybe you can help here?
I can try. This comes up again and again. Unfortunately, whitelisting (and also pattern matching) was not given much thought in the LSTM implementation. (In fact, it did not work at all in 4.0 – only for legacy models.) The CTC decoder beam is too narrow, so usually not enough alternative hypotheses survive. You should be able to get something useful by setting lstm_choice_mode=2
and lstm_choice_iterations=5
(or larger) – but IIRC this will only work on traineddata with dictionaries (like the stock models, but not on tesstrain models).
This comes up again and again
True.
If it does not work well, maybe we should disable this feature for LSTM?
If it does not work well, maybe we should disable this feature for LSTM?
I'd recommend against that, though. There might still be workable setups, like mixing LSTMs and non-LSTMs...
Current Behavior:
Normally when I use tesseract on my image, without specifying
tessedit_char_whitelist
I am getting result: "389." (without double quotes). I wanted to remove dot as I am only interested in numbers and dot does not exist on my image. So I've specified a whitelist as follows:After this change tesseract returns me an empty string.
Expected Behavior:
I would expect to get string which contains only whitelisted characters, in my case that would be "389"
BTW: I think libtesseract.so name is wrong for the version 4.1.1, currently it is
libtesseract.so.4.0.1
and it should belibtesseract.so.4.1.1