olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.91k stars 546 forks source link

Cannot disable stemmer and search for exact word #304

Closed fzingg closed 6 years ago

fzingg commented 6 years ago

Thanks for this lunr.js lib which is great.

I would like to find exact matching word: For example:

When searching "implements", I want only "implements" When searching "implemented", I want only "implemented" When searching "implementation", I want only "implementation"

Is that possible ?

I have tried disable the stemmer like this:

var index = lunr(function(){
        this.ref('headingid')
        this.field('headingname')
        this.field('text')
        this.metadataWhitelist = ['position']

        data.forEach(function(item) {
            this.add(item) 
        }, this);

        this.pipeline.remove(lunr.stemmer);
        this.pipeline.remove(lunr.stopWordFilter)

    })

But no effect.

What would be the could config ?

Thanks a lot

olivernn commented 6 years ago

Do you have an example, putting together a reproduction on something like jsfiddle makes it a lot easier to understand the problem you are having.

If you want to completely disable the stemmer you must remove it from both the indexing pipeline and the search pipeline, this should work:

var idx = lunr(function () {
  // ...snip...
  this.pipeline.remove(lunr.stemmer)
  this.searchPipeline.remove(lunr.stemmer)
  // ...snip...
})

There is more documentation on the builder object and the different properties/methods that are available.

olivernn commented 6 years ago

@fzingg did you manage to get this working?

hoelzro commented 6 years ago

@fzingg Another thing I noticed in your example is that you remove the functions from the pipeline after you index - in order for this to work, you'll need to remove them before you add any documents.

ToLuSt commented 6 years ago

Close to this topic, I have a similar question:

Is it possible to have have two pipelines, that perform different on various fields?

So for example I have two fields in index:

Now, I don't want 'technical terms' to be stemmed, because I want that exact match to risen my Precision. On the other side, I want 'body' to be stemmed.

I hope, it's clear what I mean. Please have a look at my question. Thanks a lot!

hoelzro commented 6 years ago

@ToLuSt I had a similar idea, and I had some code I hacked together (which I unfortunately cannot find) that managed to shoehorn this into lunr.js. However, I've been reading up on ElasticSearch for work lately, and from how I understand things trying to do that with lunr would potentially skew the scoring of results, unless the underlying index format for lunr were entirely changed - I would have to reproduce the code and try some searches to see.

olivernn commented 6 years ago

@ToLuSt would you mind opening a new issue for this? I think the original question/issue here is stale and I'd like to close it.

I've got a few thoughts on how to implemented field specific pipelines and am happy to have that discussion. Unfortunately GitHub doesn't allow me to create the issue on your behalf.

Also, I'm sure there is talk of this in some other issue somewhere but I can't find it.

olivernn commented 6 years ago

@ToLuSt thanks. I'm going to close this one out, @fzingg feel free to comment and I'll re-open to look further into your original question.