snowballstem / snowball

Snowball compiler and stemming algorithms
https://snowballstem.org/
BSD 3-Clause "New" or "Revised" License
748 stars 173 forks source link

Add Pāli Stemmer #197

Open khemarato opened 4 months ago

khemarato commented 4 months ago

This is currently a draft PR for starting a discussion.

The test cases are defined in https://github.com/snowballstem/snowball-data/pull/26

The current, super naive, implementation of just removing common suffixes achieves an admirable accuracy of 97.7% on the test set. See this gist for the failing cases.

Any feedback at all is appreciated.