mpcabd / python-arabic-reshaper

Reconstruct Arabic sentences to be used in applications that don't support Arabic
MIT License
398 stars 80 forks source link

Extending for Urdu #13

Closed mhb11 closed 6 years ago

mhb11 commented 7 years ago

Is an extension for Urdu available?

If not, it would be great if you extended this neat library for Urdu. Thanks!

mpcabd commented 6 years ago

Hi @mhb11, I tried it with this text

تمام انسان آزادی اور حقوق و عزت کے اعتبار سے برابر پیدا ہویٔے ہیں۔ انہیں ضمیر اور عقل و دیعت ہویٔی ہے۔ اس لیٔے انہیں ایک دوسرے کے ساتھ بھایٔی چارے کا سلوک کرنا چاہیء۔

And the only problem I found is that in when reshaping ہویٔے it is removing the hamza U+0654 since it is in the range of harakat, which is not true, since it's not a haraka, but the range contains all non-spacing characters that are deleted for reshaping.

Check this pen for the example.

I'll try to work on fixing that issue soon, but meanwhile, can you check other texts? If you find any other issues let me know.

Thanks!

mpcabd commented 6 years ago

I checked with a native speaker and the text I showed is compliant with what he expects, so I would say the tool supports Urdu out of the box already.