weixsong / elasticlunr.js

Based on lunr.js, but more flexible and customized.
http://elasticlunr.com
MIT License
2.02k stars 147 forks source link

Unexpected search results #117

Closed Bongmaster407 closed 4 years ago

Bongmaster407 commented 4 years ago

Hi there, we're experiencing some issues when filtering for certain keyword terms. Our documents have a title field that contain words like "Manager" and "Management". When I search for "manage" I get e.g. 60 results. When I search for "manager" I get the same 60 results. I guess that's because of the stemmer. When I search for "managem" though I get 0 (zero) results. The same for "manageme" and "managemen". Only when I finally search for the complete word "management" I get 60 results again. What is the issue here?

We set the expand option to true in the search configuration.

Another example is "Performance" contained in 8 results. Searching for "perform" returns 8 results. "performa", "performan" and "performanc" return 0 results. "performance" return 8 results again.

Another weird example: "Proposition"

hrigner commented 4 years ago

I have the same behaviour, and cannot figure out why :(

srenauld commented 4 years ago

Hey @Bongmaster407 , @hrigner !

What you are running into is the stemmer. You'll likely need to either disable (in order to have it not stem words at all) or replace it.

The stemmer is added to pipelines as a global singleton (!) there: https://github.com/weixsong/elasticlunr.js/blob/v0.9.5/lib/stemmer.js#L219

In order to disable it entirely, the following should do the trick:

elasticlunr.Pipeline.registerFunction(function(w) { return w; }, 'stemmer');

The 0.9.6 version, which is still not deployed on npm ( @weixsong ), has a better stemmer and a way to disable it at will.

Bongmaster407 commented 4 years ago

Thanks @srenauld ! We'll test this out, maybe in our next sprint. I'll leave another comment when we have feedback from our client.

hrigner commented 4 years ago

Thanks for the reply @srenauld! Searching for embedd gives 6 hits, searching for embeddi and embeddin gives 0 hits, and embedding gives 2 hits.

I tried disabling the stemmer, but then I get 0 hits for embedding as well. I also tried using weixsong/elasticlunr.js#v0.9.6, but that gives the same result as the 0.9.5.

I'm in the process of adding search functionality to qlik.dev :)

hrigner commented 4 years ago

I also tried replacing the stemmer with the snowball stemmer, with the same result. Embeddi and embeddin is not found, but embedding is stemmed into embed which is found. Guess I have to accept the way it works.

lukevp commented 4 years ago

@hrigner @Bongmaster407 This is how stemming works, it sounds like you are looking for a term prefix search instead. if you're showing results as the user types you could show a UI like ("0 results for embeddin.... showing results for embed") - i.e. store the last result set that had > 0 results, and if the previous term is a startswith of the current term, then you can show the previous results along with the previous term.

Another possible solution would be to switch to another lib that can do prefix matching (perhaps flexsearch or minisearch - minisearch can do prefix searching out of the box, for example).

Bongmaster407 commented 4 years ago

Thanks @lukevp ! I thought as much. We already agreed that we keep it like it is. And we really appreciate your tips and recommendations.