snowballstem / snowball

Snowball compiler and stemming algorithms
https://snowballstem.org/
BSD 3-Clause "New" or "Revised" License
748 stars 173 forks source link

Turkish proper noun suffixes #188

Open ojwb opened 9 months ago

ojwb commented 9 months ago

(Related to #187)

https://en.wikipedia.org/wiki/Turkish_language says "In modern Turkish orthography, an apostrophe is used to separate proper names from any suffixes" with the example "Türkiye'dir ("it is Turkey")". Currently we stem "türkiye'dir" to "türkiye'" but "türkiye" to "türki".

I think after removing a suffix we should also remove an apostrophe if one immediately precedes the suffix. A quick test shows we would then stem "türkiye'dir" to "türki".

Looking at turkish/voc.txt 9280 of 96325 entries contain an apostrophe.