olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.91k stars 546 forks source link

Feature request: boost by non-field value #317

Closed jonfrench closed 6 years ago

jonfrench commented 6 years ago

First, thank you for such a great library. I've searched the documentation and I think that this case is not yet supported;

Given two documents with identical search field values:

{title:"foobar", rank:1}, {title:"foobar", rank:2}

I would like a search for "title" against an index build against only the title field to always boost the first document based on its lower rank value (1 comes before 2).

The scenario in which I would find this useful:

I have a index in which the documents are the nodes of a tree data structure. Each node/document differs by id property but may have the exact same value for each indexed field property. In the case of identical node/documents (two nodes with identical property values), I would like to boost the node/document which has a lower node depth - is closer to the root node.

olivernn commented 6 years ago

Are you suggesting having support for some document property that can act as a tie breaker if the relevancy scores are the same?

I think at the moment that would be difficult to implement within Lunr. As far as Lunr is concerned, indexing a document is pretty much a one way operation, i.e. it is very difficult for Lunr to get an understanding of the original structure of that document after indexing. Adding this would require keeping an index of document ref to rank, where currently everything is stored the opposite way around, e.g. term to document ref.

Could you not do the extra sort outside of Lunr, when you get the results? I.e. look up the rank based on the returned document refs? Or am I misunderstanding your use case?

There is also this, possibly related issue, though I'm afraid I've not made any more progress on it in the last year.

jonfrench commented 6 years ago

Thanks for posting Oliver. Yes, your suggestion of a tie breaker property is apt. I had thought of the sorting outside of Lunr also. Perhaps that will be the best implementation.

I'm not sure if the linked, #237 issue would help me. In my case, I do not want to restrict the results by rank, just use the rank to break a tie as you suggest.

Thank you again for a great library. If I think I see an opportunity in the Lunr source to implement a tie breaking property, I'll pass along.

olivernn commented 6 years ago

I've been thinking about this some more and I think the right solution is actually document boosts at build time. It could be used to solve this problem, as well as giving boosts to higher quality 'documents'. Specifically say your documents are products that have a rating, the rating could be used as a document boost so that higher rated products are more likely to top the search results.

I've recently been working on this, actually adding support for adding document boosts has been straightforward, however getting them to work with the current similarity scoring is proving a bit harder than I expected. This is something that I am working on when I get a chance though.

jonfrench commented 6 years ago

Yes. Thank you Oliver. Boosting the document at build time would meet my requirement. I appreciate you looking at this.

olivernn commented 6 years ago

I've just released 2.3.0 which includes support for boosting documents.

The add method used to add documents to the index now accepts options, one of which is boost which boosts all terms in the given document.

var idx = lunr(function () {
  this.ref('name')
  this.field('text')

  documents.forEach(function (doc) {
    this.add(doc, { boost: boostValue })
  }, this)
})