projectEndings / staticSearch

A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
https://endings.uvic.ca/staticSearch/docs/index.html
Mozilla Public License 2.0
51 stars 22 forks source link

Find a way to incorporate the Java Porter Stemmer #244

Open martindholmes opened 2 years ago

martindholmes commented 2 years ago

At various times I've come across a Java lib incorporating all the currently-supported languages from the Porter Stemmer project as well as a collection of JavaScript implementations. There are various sources for these, so identifying the most reliable and canonical versions would be the first step (for Java, presumably here: https://snowballstem.org/download.html). If we can find a way to wrap the Java lib so that Saxon can use it, and we can find matching JS implementations for all the languages, we may be able to replace our own stemmers and support many more languages.

joeytakeda commented 7 months ago

I did do a bit of this work a while back; I never ended up finishing it up, but might be a good base: https://github.com/joeytakeda/saxon-ext-test