snowballstem / snowball

Snowball compiler and stemming algorithms
https://snowballstem.org/
BSD 3-Clause "New" or "Revised" License
757 stars 173 forks source link

Czech and Slovak algorithms. #149

Closed gaboull closed 1 year ago

ojwb commented 3 years ago

The process for submitting a new stemmer is documented in CONTRIBUTING.rst. In particular we need a test vocabulary adding to snowball-data so there's test coverage and a page adding to the website with some background on the algorithm to aid future maintenance (if a bug is reported and all we have is the snowball implementation it can be hard to tell if it's an intentional design trade-off or an oversight).

The czech stemmer is already on the website so I know that it comes from a paper and who implemented it, so I can easily fill that in and I've created a test vocabulary from wikipedia data (in #151).

I don't know any background to the slovak algorithm here though.