Closed MihaiValentin closed 7 years ago
You've stumbled upon one of the big tradeoffs of building information retrieval systems, precision vs recal. Getting this balance right for many varied applications is a challenge. As you mention, currently lunr will only score and return documents that have all the query terms, which can mean that queries with multiple terms can return few, if any, results. This is certainly biased further towards precision rather than recall, there are almost certainly some results that contain only a subset of the query terms that are relevant to the search.
I think that what I'd like to see lunr doing is to score all documents that match at least one of the query terms. Documents that do not match all of the terms would have a (much) lower score and so should not prevent documents with many or all query terms being the top returned documents.
Personally I think it would be better to have this happen automatically, rather than having the user specify what kind of search (union or intersection), I think this should be possible.
I am currently re-working some of the internals of lunr, especially to do with how scoring documents works. Because of this its probably best you hold off on submitting any changes; I've yet to fully flesh out how scoring should work. I'll definitely keep you updated on the feature here though, and having some support with testing would be really useful if your keen to help?
Hi Oliver,
Though for the moment I've made an union in addition to the intersection and thus solve my problem, I'm looking forward to try out the new version as soon as it does reunion out of the box.
I'll be sure to give you any feedback regarding this approach, after testing it on my search data and observe the results.
Hey - I'm new to lunr.js and well...search in general, so I'm actually looking for some examples using unions as MihaiValentin did to achieve this result until the re-work is complete.
Thanks!
@braceandbracket the simplest thing that might work here would be to combine multiple search results, e.g.
index.search('foo').concat(index.search('bar'))
You would have to watch out for documents that were in both search results and combine them somehow, but you get the idea.
Perhaps @MihaiValentin could give you some tips on how his implementation worked.
Maybe someone can help me with this, too, because I think that my usecase is somewhat related to this.
So, what I would like to implement is a search with multiple search tokens returning only those documents that have all search tokens in either of the document fields. So, when I have documents with text fields {a: "hello", b: "world"}
, and I am searching for hello world
then that document should be returned.
What lunrjs and elasticlunr currently do by default, is to only return documents that match all search terms in a single document field.
I've tried the bool
search option, but to no avail.
This is an old issue, but Lunr 2.x does now support combining query terms with OR.
At the moment when searching for multiple terms, only the documents containing all the terms are included in the results. This is fine, however there are usecases when multiple keywords don't get many results (sometimes none at all) and it would be nice to add the documents that matched the individual terms on top of the intersected documents.
This could be configured as part of the
.search
method, by giving an additional argument to the search query.index.search('hello world', {union: true})
Internally, this would do what it does now, but at the end also make an union between the individual terms and then add it at the end of the intersected results. Also, union'ed results could be marked using an additional property in order to know they are coming from the union so developers can present them in a different manner, if needed.
Is this something of interest for lunr? What do you think? If the answer is positive, I can come up with a PR proposal for this.