olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 545 forks source link

searching in a field for an exact phrase containing spaces #366

Closed punkish closed 5 years ago

punkish commented 6 years ago

I have tags on all my web pages, and have built the lunr index with the tags field. Some tags are multiword phrases. If I search for tags:hand drawn maps I get completely wrong result. However, if I search for tags:hand* I get the correct result. I have tried tags:+hand +drawn +maps and tags:"hand drawn maps" but without any success. Suggestions?

update: I should add that I make my index and conduct my query like so

    idx = lunr(function () {
        this.field('title', { boost: 10 }),
        this.field('tags'),
        this.field('body'), { boost: 20 },
        this.field('created'),
        this.ref('file'),

        pages.forEach(function (doc) {
            this.add(doc)
        }, this)
    });

    searchResult = idx.search(q).map(function(result) {
            return {
                ref : result.ref,
                disp : result.ref.replace(/-/g, ' ')
            }
        });
olivernn commented 6 years ago

When performing a search using lunr.Index#search the query string you use is parsed into a lunr.Query object. By default the parser assumes that whitespace indicates a term boundary, that is a search for "foo bar" is interpreted as a search for the terms foo and bar. What you want is a search for the term foo bar.

You can bypass the query parsing entirely by using the lunr.Index#query method which allows you to specify the term exactly how you want. Alternatively you can continue using lunr.Index#search but escape the spaces in the query string. Finally you could use wildcards to match the spaces, though this will match any character so is probably not a great solution.

I have put together a fiddle showing the three approaches. I think using lunr.Index#query is probably the best choice.

icidasset commented 6 years ago

Thanks for explaining that in detail @olivernn !

I was wondering about one thing: How come if I don't wrap the tag in an array, when adding, it no longer works? (line 7, see updated fiddle: https://jsfiddle.net/hey72dmg/14/)

It couldn't find anything in the docs.

olivernn commented 6 years ago

@icidasset yeah, sorry I didn't explain that well, and the relevant document is a bit hidden.

Lunr uses a lunr.Token internally, these are mostly just wrappers around a string. lunr.tokenizer is used to create lists of tokens from the fields of the documents being indexed.

If a field is an array, Lunr assumes that the items in the array are already 'tokens', for everything else it assumes it has to split the field into tokens itself. It does this splitting on whitespace (among other characters).

So, when you pass an ['foo bar baz'] Lunr assumes you have already done the splitting and the token is "foo bar baz", when you just pass "foo bar baz" it does the splitting it self and you get "foo", "bar", "baz".

melozzo commented 4 years ago

/Users/athenacapsis/Documents/Making and Knowing/making-knowing-edition/scripts/test_search.js Why doesnt my test work?

melozzo commented 4 years ago

const fs = require('fs');

// load lunr with fr support var lunr = require('lunr'); require("lunr-languages/lunr.stemmer.support")(lunr) require('lunr-languages/lunr.multi')(lunr) require("lunr-languages/lunr.fr")(lunr)

const jsdom = require("jsdom"); const { JSDOM } = jsdom;

var documents = [{ "name": "Lunr", "text": "Like Solr, but much smaller, and not as bright." }, { "name": "React", "text": "A JavaScript library for building user interfaces." }, { "name": "Lodash", "text": "A modern JavaScript utility library delivering modularity, performance & extras." }]

var idx = lunr(function () {
  this.ref('name')
  this.field('text')

  documents.forEach(function (doc) {
    this.add(doc)
  }, this)
})

/* USING QUERY METHOD****/

let results = idx.query(function(){ const searchTerm = 'A JavaScript library for building user interfaces.'; //'library' console.log(search term is ${searchTerm}) this.term(['A JavaScript library for building user interfaces.']) }); console.log(results length is ${results.length})

  const resultText = [];
  let docWithTheTerm;
  results.forEach( result =>{
  docWithTheTerm = documents.find( d =>{ return d.name === result.ref})
        if(docWithTheTerm !== undefined)
        {
              console.log(docWithTheTerm.text)
              resultText.push(docWithTheTerm);
        }
  })