olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.91k stars 546 forks source link

Searching multiple terms on multiple fields with different boost using Index.query #289

Closed myalgo closed 7 years ago

myalgo commented 7 years ago

I would like to know to write the search query efficiently for searching multiple terms on multiple fields with different boost

I have set of fields

and would like to search "foo bar" or "foo bar*"

Since build time boost is not supported in 2.1.2, I am using query time boost. However would like to use to search with and without trailing wildcard and with/without usepipeline to get the best results.

Please let me know an efficient way to write query using Index.query or Index.search.

taysloane commented 7 years ago

You could use something like this (tested in 2.1.0):

const query = 'foo bar';
const idx = ...;
const matches =  idx.query(q => {
    q.term(query, { fields: [ 'title' ], boost: 10 });
    q.term(query, { fields: [ 'heading' ], boost: 5 });
    q.term(query, { fields: [ 'text' ], boost: 3 });
    q.term(query, { fields: [ 'keywords' ], boost: 6 });

    q.term(query, { fields: [ 'title' ], boost: 10, wildcard: lunr.Query.wildcard.TRAILING });
    q.term(query, { fields: [ 'heading' ], boost: 5, wildcard: lunr.Query.wildcard.TRAILING });
    q.term(query, { fields: [ 'text' ], boost: 3, wildcard: lunr.Query.wildcard.TRAILING });
    q.term(query, { fields: [ 'keywords' ], boost: 6, wildcard: lunr.Query.wildcard.TRAILING });
  });
});

You can add usePipeline: false, editDistance: ... etc. as normal inside there. We're using something similar in production.

olivernn commented 7 years ago

What @jakeapeters has looks good, one thing to be careful of though is that the query string passed to q.term will not be split into multiple tokens by lunr. I.e. if query is the string "foo bar" it will search for "foo bar" not "foo", "bar", with the default index builder this will mean you won't get any results.

You can borrow Lunr's tokenizer, or use the regex it uses directly:

idx.query(function (q) {
  lunr.tokenizer(queryString).forEach(function (token) {
    q.term(token.toString(), { fields: [ 'title' ], boost: 10 })
    // etc....
  })
})

or, just borrow the regex:

idx.query(function (q) {

  queryString.split(lunr.tokenizer.separator).forEach(function (term) {
    q.term(term, { fields: [ 'title' ], boost: 10 })
    // etc....
  })
})
myalgo commented 7 years ago

thanks @jakeapeters and @olivernn.

myalgo commented 7 years ago

@olivernn , do you plan to support build time boost?

olivernn commented 7 years ago

@myalgo please open a new issue requesting build time boost.

I'm going to close this issue now as I think the original problem is solved.