Open YunTsen opened 4 years ago
Actually i have tried with russian characters and it worked pretty well. So, i am assuming that the problem is specific to some subset of the UTF-8
我也一样遇到这个问题 请问解决了吗
把系统语言设置成utf8貌似是成功的。
tessedit_char_whitelist=\u8f09
AFAIK. this usage is not supported.
Did you tried:
tessedit_char_whitelist=載
?
Anyway, the aliowlist / denylist feature is known to not work well with the LSTM engine.
Environment
Current Behavior:
While using chi_tra to work on this image, the result was "載", which was great. However, after specifying the whielist using following commands: config='--oem 0 --psm 6 -c tessedit_char_whitelist=\u8f09' , (\u8509 is the unicode for "載")or config='--oem 0 --psm 6 -c tessedit_char_whitelist=載' the results turned out to be null.
It seems that whitelist could only accept English characters or digits(whitelist does work for numbers, I have tested that). How come?
p.s. I tried this because I wanted Tesseract to detect only the words on whitelist.
Expected Behavior:
Seems chi_tra could detect "載" accurately without the whitelist, it should also work if whitelist="載" is given.
Suggested Fix:
The variable tessedit_char_whitelist should accept non-English characters.