olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.97k stars 548 forks source link

Considering union of results for multiple query tokens? #86

Closed MihaiValentin closed 7 years ago

MihaiValentin commented 10 years ago

At the moment when searching for multiple terms, only the documents containing all the terms are included in the results. This is fine, however there are usecases when multiple keywords don't get many results (sometimes none at all) and it would be nice to add the documents that matched the individual terms on top of the intersected documents.

This could be configured as part of the .search method, by giving an additional argument to the search query.

index.search('hello world', {union: true})

Internally, this would do what it does now, but at the end also make an union between the individual terms and then add it at the end of the intersected results. Also, union'ed results could be marked using an additional property in order to know they are coming from the union so developers can present them in a different manner, if needed.

Is this something of interest for lunr? What do you think? If the answer is positive, I can come up with a PR proposal for this.

olivernn commented 10 years ago

You've stumbled upon one of the big tradeoffs of building information retrieval systems, precision vs recal. Getting this balance right for many varied applications is a challenge. As you mention, currently lunr will only score and return documents that have all the query terms, which can mean that queries with multiple terms can return few, if any, results. This is certainly biased further towards precision rather than recall, there are almost certainly some results that contain only a subset of the query terms that are relevant to the search.

I think that what I'd like to see lunr doing is to score all documents that match at least one of the query terms. Documents that do not match all of the terms would have a (much) lower score and so should not prevent documents with many or all query terms being the top returned documents.

Personally I think it would be better to have this happen automatically, rather than having the user specify what kind of search (union or intersection), I think this should be possible.

I am currently re-working some of the internals of lunr, especially to do with how scoring documents works. Because of this its probably best you hold off on submitting any changes; I've yet to fully flesh out how scoring should work. I'll definitely keep you updated on the feature here though, and having some support with testing would be really useful if your keen to help?

MihaiValentin commented 10 years ago

Hi Oliver,

Though for the moment I've made an union in addition to the intersection and thus solve my problem, I'm looking forward to try out the new version as soon as it does reunion out of the box.

I'll be sure to give you any feedback regarding this approach, after testing it on my search data and observe the results.

braceandbracket commented 9 years ago

Hey - I'm new to lunr.js and well...search in general, so I'm actually looking for some examples using unions as MihaiValentin did to achieve this result until the re-work is complete.

Thanks!

olivernn commented 9 years ago

@braceandbracket the simplest thing that might work here would be to combine multiple search results, e.g.

index.search('foo').concat(index.search('bar'))

You would have to watch out for documents that were in both search results and combine them somehow, but you get the idea.

Perhaps @MihaiValentin could give you some tips on how his implementation worked.

httpdigest commented 7 years ago

Maybe someone can help me with this, too, because I think that my usecase is somewhat related to this. So, what I would like to implement is a search with multiple search tokens returning only those documents that have all search tokens in either of the document fields. So, when I have documents with text fields {a: "hello", b: "world"}, and I am searching for hello world then that document should be returned. What lunrjs and elasticlunr currently do by default, is to only return documents that match all search terms in a single document field. I've tried the bool search option, but to no avail.

olivernn commented 7 years ago

This is an old issue, but Lunr 2.x does now support combining query terms with OR.