rmtheis / tess-two

Fork of Tesseract Tools for Android
Apache License 2.0
3.76k stars 1.38k forks source link

How I can init the tesseract to work only with digits? #253

Closed ibrahimAlii closed 6 years ago

ibrahimAlii commented 6 years ago

Summary: Well in my app I just want to initiate the tesseract to work only with digits, specifically arabic digits, Is there any way to avoid recognize characters and just recognize digits instead ?

I've use setVaraible but it's the same

baseAPI.init(dataPath, "ara");
        baseAPI.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, variable);
        baseAPI.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, "ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ي ء هو !?@#$%&* >> << ()<>_-+=/:;'\\\"");
        baseAPI.setVariable("classify_bln_numeric_mode", "1");

I'm still getting characters in result.

rmtheis commented 6 years ago

Hmm, what's the value of variable in your example?

ibrahimAlii commented 6 years ago

@rmtheis It's all digits in arabic "٠١٢٣٤٥٦٧٨٩١٠"

rmtheis commented 6 years ago

Hmm, I would try setting just the whitelist value by itself, without setting the blacklist value or the other value.

If that gives you the same result maybe try asking on the Tesseract forum.

ibrahimAlii commented 6 years ago

@rmtheis Thanks, I still getting the same result, and I didn't get any useful response om Tesseract forum.

Any Idea?

rmtheis commented 6 years ago
  1. Try different OcrEngineMode values.
  2. Retrain or edit the training data. I don't know enough about this to be able to help.