linuxscout / pyarabic

pyarabic
GNU General Public License v3.0
450 stars 85 forks source link

Update the strip_tashkeel and strip_diactricts to remove the alef after tanween al fateh #70

Open Mansari opened 1 year ago

Mansari commented 1 year ago

The strip_tashkeel and strip_diactricts functions are very helpful when preprocessing text that will be used for searches. With these functions, one can search for a word that like رحيم without tashkeel. However, one of the challenges is this will not match a word that had tanween al fateh at the end, as the word after removing the tashkeel will still be different in structure رحيما.

I suggest adding another optional flag (to support previous versions) that will also remove the alef if it comes after tanween al fateh. See https://en.rattibha.com/thread/1266046390439903234 for details

Thank you for the amazing library!

linuxscout commented 1 year ago

Thank You brother Mohamed, It's a great suggestion, we can add it. Thanks

Mansari commented 1 year ago

Amazing - anything I can help with insha Allah?