nextapps-de / flexsearch

Next-Generation full text search library for Browser and Node.js
Apache License 2.0
12.53k stars 491 forks source link

Why doesn't stemming work in Node? #280

Closed peterbe closed 2 years ago

peterbe commented 3 years ago

This could be a user-error. If so, I'm sorry, but it's not easy to understand how to use it. I know the language, in my use case is, English (en) so I want it to find "repositories" and I search for "repository" and vice versa. I tried this:

const { Index, Document, Worker } = require("flexsearch");

const index = new Index({language: 'en', tokenize: "forward", stemmer: true})
const docs = ["Procurement is a word", "But also procurements is a word", "Repository is a word"]
docs.forEach((doc, i) => {
  index.add(i, doc)
})

const results = index.search('repositories', {limit: 10})
for (const key of results) {
  console.log(key, docs[key])
}

It's not finding anything. And if I search for procurement I want it to match Procurement and procurements.

Demo here: https://glitch.com/edit/#!/insidious-roan-note

peterbe commented 3 years ago

I had some luck with

const index = new Index({language: 'en', tokenize: "forward",stemmer: {

        // object {key: replacement}
        "ational": "ate",
        "s": "",
        "y": "ies",
        "tional": "tion",
        "enci": "ence",
        "ing": ""
    },})

but I'm not sure it's "smart enough". The loading of language: 'en' should hopefully have a stemmer built in specifically for the English language.

ts-thomas commented 2 years ago

I will update the docs shortly to show how to use extern stemmer library.