olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 548 forks source link

Prefix search doesn''t work in some cases. #413

Closed JulianKK closed 4 years ago

JulianKK commented 4 years ago

There is a document in my index with the title 'Reconnaissance' and when I try to find it, using prefix. It is found from 'r' to 'reconnaiss' but if I try to find 'reconnaissa' nothing is found. Did I missed something?

Same thing with vandalism => from vandal on there is no result anymore

hidmic commented 4 years ago

Same issue here, see ticket. It happens both with and without wildcards.

olivernn commented 4 years ago

This sounds likely to be caused by stemming.

> lunr.stemmer(new lunr.Token("reconnaissance")).toString()
'reconnaiss'

That is, only the text 'reconnaiss' is actually indexed by Lunr. This works when performing non prefix search because the search term will also be stemmed before doing the lookup. Stemming is not possible when doing a prefix search, and so once you have a prefix longer than the stemmed word stored in the index there will be no matches.

You can disable stemming for the index, or alternatively perform both a prefix search and a non-prefix search together.

idx.query(function (query) {
  // prefix search, no boost
  query.term("reconnaiss", { wildcard: lunr.Query.wildcard.TRAILING, boost: 1 })

  // 'exact' match, boosted
  query.term("reconnaiss", { boost: 10 })
})

The above will combine both a prefix search with an exact search, exact matches will typically rank higher given the boosts.