partial word matches ranked higher than complete matches

olivernn / lunr.js

A bit like Solr, but much smaller and not as bright

http://lunrjs.com

MIT License

8.96k stars 548 forks source link

partial word matches ranked higher than complete matches #175

Closed tjstebbing closed 7 years ago

tjstebbing commented 9 years ago

Hello. I'm just trying out lunr using all of the default options. I'm noticing though that partial word matches are ranked equally with whole word matches. For instance, searching for 'pineapple'

will rank:

'The pinemartin sat pining in the pine tree wearing a pinafore'

above:

'I love pineapple'

I'm guessing because it has a higher number of partial matches. Is there a way to just turn off partial word matching?

Thanks

olivernn commented 9 years ago

Currently there is no way to turn off partial matching of words unfortunately. There is code specifically to try and guard against this case, but it seems it isn't as effective as it could be.

The current prefix lookup is done here, I guess it might be possible to control this with an option, but I'm not sure what that would look like? Would it be for the entire index, or for a particular search?

lswright commented 8 years ago

Hi, Just following on from this question. Is there a fix? I would like to use the result scores as a filter but if word stems rank higher or equal the actual words, this will be difficult. Could this be modified by changing the "sensitivity" of the stemmer, if so how?

Thanks

AndyOGo commented 8 years ago

I have the exact same problem with lunr.js. So I started to compare lunr against elasticlunr and indeed the result are different. So far I haven't face this issue with elasticlunr

lswright commented 8 years ago

Thanks for the tip Andy, I'll give it a go.

olivernn commented 7 years ago

The latest version of Lunr handles this much better now. There are no more implicit wildcards, so this kind of behaviour is more obvious (and easier to opt out of):

var idx = lunr(function () {
  this.field('text')
  this.add({id: '1', text: 'The pinemartin sat pining in the pine tree wearing a pinafore'})
  this.add({id: '2', text: 'I love pineapple'})
})

idx.search('pineapple') // matches only document 2
idx.search('pine*') // matches both, with document 1 more relevant