rmtheis / tess-two

Fork of Tesseract Tools for Android
Apache License 2.0
3.76k stars 1.38k forks source link

Illegible words recognition in Persian lang #259

Closed ImanX closed 5 years ago

ImanX commented 5 years ago

Summary: I implemented tess-two and the .traineddata imported in project as Persian language tess-two work but that return Illegible words like:

   ـاغ {.
    ٥ ج.: { ٠
    ٤ \ ٤2,
    } 13
    ؤ. …
    « چ \ ة 8۱
    :} 3 ١.٠
    ٠ ء,٬, "و ۱١ |
    ), ٠
    } ( \ ق {۰
    | } چ
    د … ة ؛ ٠
    ؛ \ ؤ ٠٠
    دغ٬ ؤ \ 3
    حس {؛ | غ
    3 ق : « }
    دا ) { 3 د.
    » < {:
    ٠ دێ .
    ؛ ,? 33٠ ,
    { -3 ٠_
    {سم

Tess-two version: 5.4.1

Android version: 6.0

Phone/device model: Samsung S6

Phone/device architecture (armeabi, armeabi-v7a, x86, mips, arm64-v8a, x86_64, mips64): ARM64

Robyer commented 5 years ago

@ImanX tess-two 5.4.1 is more than 3 years old, you should try latest version 9.0.0.

rmtheis commented 5 years ago

You might try asking on the Tesseract mailing list and including a sample input image so you can get suggestions about what image processing to do in order to get a better result. While your current result is clearly not what you're looking for, it does look like Tesseract is working as intended. Robyer's suggestion of trying a newer version is a good one too.