olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.91k stars 546 forks source link

Does boosting multiple fields increases rank? #303

Closed tehandyb closed 6 years ago

tehandyb commented 6 years ago

Hi I was wondering if boosting multiple fields will make a result rank higher than another result that only matched on one field.

queryCreator.term('foo', { fields: ['name'], usePipeline: false, boost: 2 })
queryCreator.term('foo', { fields: ['otherField'], usePipeline: false, boost: 1 })

//Data looks like
var data = [
  { name: 'foo', otherField: 'bar' }, 
  { name: 'foo baz', otherField: 'foo' } // Would this rank higher since it matched on two fields?
]
olivernn commented 6 years ago

Probably.

A score for a document depends on a number of things:

So, in your example the second document might rank higher, because it matches two fields, but that match of 'foo' in the name field is diluted since that term is only 50% of that field, while it is 100% of the field in the other document. Specifically, in the second document, the term 'foo' is deemed only half as relevant in field name than the first document, since it has twice as many terms that are not 'foo'

Usually I would suggest that boost happen in orders of magnitude, e.g. 1, 10, 100, to have an obvious affect, but of course this depends entirely on your application and data. Search and relevancy of results can be a bit of trial and error, but hopefully Lunr provides enough knobs to twiddle to get the results you want with the data you have.

tehandyb commented 6 years ago

Thank you, that's really useful info! I was trying to play around with the boost and was experiencing what I thought was unpredictable behavior, but now it makes much more sense. I wish there was some sort of visualization that might show how the search relevancy/boost worked, but I really appreciate the explanation.

olivernn commented 6 years ago

I wish there was some sort of visualization that might show how the search relevancy/boost worked

I have thought about having some kind tool that makes interacting with and inspecting an index easier. I'm sure it would help people understand the results that they get (or don't get) from searches. I've often wanted it to understand a particular issue someone is having with their index too.

Actually getting time to sit down and implement something is another story though, one day, maybe...

tehandyb commented 6 years ago

Yeah it would be so useful. I would love to help make something like that and have a bunch of d3/dataviz experience, so let me know!

On another note(sorry if I should post in another issue), do you know if it's possible to set a custom boost for when a term matches a field that has less terms in that field? I've been having trouble getting a document to rank higher in a case similar to this:

queryCreator.term('foo', { fields: ['name'], usePipeline: false, boost: 100 })
queryCreator.term('foo', { fields: ['otherField'], usePipeline: false, boost: 1 })

const term = 'foo'
const documents = [
  { name: 'foo', otherField: 'bar' }, // I want this to rank higher because it has less terms in 'name', and 'name' has a boost that far outweighs the boost on 'otherField'
  { name: 'foo baz', otherField: 'foo and some more foo foo' }
]
olivernn commented 6 years ago

@tehandyb sorry I seem to have missed your update to this issue...

Feel free to open an issue to discuss ideas for visualisations of lunr data, either that or there is a gitter chat room, whichever you prefer.

As for the second part of your question, what did you come up with? I've put together a fiddle which shows the results I'd expect, the document with only "foo" in the "name" field matches much higher than the other document. Did I misunderstand what you were after?

olivernn commented 6 years ago

This is quite old now, but the docs have been updates with a section on scoring.

Would still love some kind of index visualisation 😉