olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 548 forks source link

Weird matching behavior in basic example of lunr #408

Closed cbdeveloper closed 5 years ago

cbdeveloper commented 5 years ago

I'm having some difficulties while trying to implement a basic search index with lunr. It's kind of working but I'm not been able to match some words and you can see from the GIF below:

I've got only 2 "posts" added to the index.

const posts = [
    {
      id: "id1",
      title: "This is the best post"
    },
    {
      id: "id2",
      title: "This is some other post"
    }
  ];

lunrissue

Building the lunr index

function buildLunrIndex() {
      lunrIndexRef.current = lunr(function() {
        this.field("title");
        posts.forEach(post => {
          console.log("Adding: " + post.title);
          this.add(post);
        });
      });
    }

Doing Searches

function doSearch(value) {
    const results = lunrIndexRef.current.search(value);
    const matchedIDs = results.map(item => item.ref);
    setPostResultList(posts.filter(item => matchedIDs.indexOf(item.id) > -1));
  }

I made this example available on CodeSandbox https://codesandbox.io/s/y10nwyx91

Question1: What is going on with my example ? Am I doing something wrong or is this a bug? I tried to follow the documentation as best as I could.

Question2: Can I make lunr match part of words (tokens) ?

olivernn commented 5 years ago

It looks like you are running into the stop word filter, this is designed to remove very common words from the index, usually they are not very useful when performing a search because they exist in almost every document. The two texts in your example are almost entirely stop words, only "best" and "post" are not, this is why you are seeing the behaviour you are.

You can remove the stop word filter, but be aware it will increase the size of the index:

    var idx = lunr(function () {
      this.pipeline.remove(lunr.stopWordFilter)
    })

There are a bunch of options when it comes to search, both wildcards and fuzzy matches.