mikegoatly / lifti

A lightweight full text indexer for .NET
MIT License
184 stars 9 forks source link

Suggestion: custom stemmers #82

Closed Dissimilis closed 10 months ago

Dissimilis commented 1 year ago

Judging by the code this.stemmer = new PorterStemmer(); it looks like implementing and passing my own stemmer is impossible.

It should be trivial to make API changes allowing to assign custom stemmer in TokenizationOptions. But maybe IStemmer would need more thoughts on the design.

P.S. this.stemmer = new PorterStemmer(); is a nice illustration of new is glue :)

mikegoatly commented 1 year ago

Thanks for the suggestion! Yeah, at the moment only Porter stemming is supported - the IStemmer interface is internal because it hasn't currently been designed with extensibility in mind.

You raise an interesting point though; there are other stemming algorithms, not least so that words from languages other than English can be stemmed effectively.

It's definitely something to think about...

mikegoatly commented 10 months ago

Custom stemming will be available in v6