rmtheis / tess-two

Fork of Tesseract Tools for Android
Apache License 2.0
3.76k stars 1.38k forks source link

Arabic trained-data produce 20% accuracy #250

Closed ibrahimAlii closed 6 years ago

ibrahimAlii commented 6 years ago

Summary:

When I use english data It's worked very well, but when I use arabic it's required to copy all cube data and also produced in bad quality.

Steps to reproduce the issue:

  1. Input any arabic digits/words.
  2. Get the Utf8Text()

Expected result: I should get correct data.

Actual result: I got wired result.

Tess-two version: 8.0.0

Android version: 28

Phone/device model: Pixel

Link to training data used: https://github.com/tesseract-ocr/tessdata/blob/3.04.00/ara.traineddata

Link to image used as input:

http://3.bp.blogspot.com/-CZRdjlj2ybU/TkAbU6C4RWI/AAAAAAAAAAw/n4Hej0ct3rw/s1600/ind.jpg

rmtheis commented 6 years ago

Thanks for the bug report. It's not entirely clear to me what the problem is because you just said you get a "weird result." Maybe try different page segmentation modes and try using different portions of the input image.

Most likely your issue is not a bug and this is working as intended.

ibrahimAlii commented 6 years ago

@rmtheis Please check below image

screenshot_1537628905

The result is should be like picture I got some arabic character instead of digits, also there is digits like one and three i got nine instead of one and "ها" instead of three and "هلا" instead of six.