mpcabd / python-arabic-reshaper

Reconstruct Arabic sentences to be used in applications that don't support Arabic
MIT License
398 stars 80 forks source link

Lam-Alef incorrect glyphs #2

Closed jlev closed 10 years ago

jlev commented 10 years ago

The LAM_ALEF_GLYPHS table includes chinese characters \u3BA6 and \u3BA7. This appears to be due to a bug in the original java version at https://github.com/agawish/Better-Arabic-Reshaper/blob/master/src/org/amr/arabic/ArabicReshaper.java#L60

In python, the table should be:

LAM_ALEF_GLYPHS = [
        [u'\u0622', u'\uFEF6', u'\uFEF5'],
        [u'\u0623', u'\uFEF8', u'\uFEF7'],
        [u'\u0627', u'\uFEFC', u'\uFEFB'],
        [u'\u0625', u'\uFEFA', u'\uFEF9']
]

In the java, I believe the table should be

public static char[][] LAM_ALEF_GLPHIES=
        {{1570,65270,65269},
         {1571,65272,65271},
         {1575, 65276,65275},
         {1573, 65274,65273}
       };
mpcabd commented 10 years ago

Thanks for your contribution and issue report, will just fix it.