Closed mastercoms closed 1 year ago
Hi @mastercoms ,
yes, the fields
search option should work with query expression trees too:
const ms = new MiniSearch({ fields: ['title', 'author', 'text'] })
let docs = [
{ id: 1, title: 'Let It Be', author: 'The Beatles', text: 'When I find myself in times of...' },
{ id: 2, title: 'Do I Wanna Know?', author: 'Arctic Monkeys', text: 'Have you got color in...' },
{ id: 3, title: 'The Times are A-Changing', author: 'Bob Dylan', text: 'Come gather round people...' },
{ id: 4, title: 'One Love (People get Ready)', author: 'Bob Marley', text: 'One love, one heart...' },
]
ms.addAll(docs)
// Simple example, "people" is in the text of document 3 and in the title of document 4,
// but we restrict the search to the title field:
ms.search({ fields: ['title'], queries: ['people'] })
// => [{ id: 4, ...}]
// The query above is identical to:
ms.search('people', { fields: ['title'] })
// More interesting example with deeper query expression tree:
ms.search({
queries: [
{ fields: ['title'], queries: ['times'] },
{ fields: ['author'], queries: ['arctic'] }
]
})
// => [{ id: 2, ... }, { id: 3, ... }]
// Another example, where both subqueries have results, but no document satisfies both,
// so combining with AND yields no result:
ms.search({
queries: [
{ fields: ['title'], queries: ['people'] },
{ fields: ['text'], queries: ['people'] }
],
combineWith: 'AND'
})
// => []
// Contrast it with the following query, which instead yields two results:
ms.search('people', { fields: ['title', 'text'], combineWith: 'AND' })
// => [{ id: 3, ... }, { id: 4, ... }]
Do you have a specific example where this does not work? If you can provide an example to reproduce the problem I will look into it.
Ok, thank you for the response! Maybe I'm just being confused by the score? Is it that results have to match at least each node somewhat, or does each node contribute to a score? I'm currently getting cases where it does work in a lot of cases but then there's a few cases where results are getting in that don't match the entire tree.
And how could I demonstrate this happening properly? Would it work to provide the document JSON link and the query object?
Hi @mastercoms ,
by default, unless you specify the combineWith
option, different subqueries are combined with OR
. Therefore, by default, you will get results for each document that matches at least one subquery, but those that match more than one subquery will generally score higher. This maximizes recall (which proportion of the relevant documents are returned), at the expense of precision (which proportion of the returned documents are relevant). Since scoring should sort better matches first, this is often a good default approach.
If you instead specify combineWith: 'AND'
at the root, or in the parent of the nodes you have to combine, then the subqueries will be combined with AND
: the results will only include the documents that match all subqueries, and any document that does not match even just a part of the tree will be omitted. With AND
you get higher precision at the expense of recall. Depending on the use case, you might prefer one or the other.
Note that the value of an option is inherited by all sub trees, unless the sub tree specifies a different value for the same option. Therefore, if you specify combineWith: 'AND'
at the root, you are basically changing the default for the whole tree (which is often what you want).
Referring to the same data as my comment above, this finds all documents that contain "people"
in the title
field or "bob"
in the author
field:
ms.search({
queries: [
{ fields: ['title'], queries: ['people'] },
{ fields: ['author'], queries: ['bob'] }
]
})
// => [{ id: 4, ... }, { id: 3, ... }]
Both documents 4
and 3
are returned, but 4
scores higher because it matches both subqueries, while 3
only matches the one for the author
field.
Instead, this finds all documents that contain "people"
in the title
field and "bob"
in the author
field:
ms.search({
queries: [
{ fields: ['title'], queries: ['people'] },
{ fields: ['author'], queries: ['bob'] }
],
combineWith: 'AND'
})
// => [{ id: 4, ... }]
In this case, only document 4
is returned, because it is the only one that has both "people"
in the title
field and "bob"
in the author
field. The score for document 4
is the same as above, but 3
is discarded completely.
If we imagine a query language where we can put the query string in quotes, followed by the field after a semicolon, and use AND
and OR
to combine queries, the first example would be "people":title OR "bob":author
, the second example would be "people":title AND "bob":author
.
If that's not resolving your issue, and you can share a link to the document JSON and the query, I can definitely have a look.
Thanks so much for the thorough explanation! I looked everything over, and I'm pretty sure my search syntax and combineWith are correct, but I might be wrong. I've provided the query string, and query objects below to verify along with the results. The search website is available here along with relevant source code here, just in case you also needed that.
https://comfig.app/huds/search.json
query string: 7hud
parsed query object:
"7hud"
search options:
{
"boost": {
"name": 5,
"author": 2
},
"prefix": true
}
So without any field queries, we have both 7hud and PeachHUD returned.
query string: name:7hud
parsed query object:
{
"combineWith": "AND",
"queries": [
{
"fields": [
"name"
],
"queries": [
"7hud"
],
"combineWith": "OR"
}
]
}
search options:
{
"boost": {
"name": 5,
"author": 2
},
"prefix": true
}
So with the field query, we have just 7hud returned, as expected. So seems to be working. However...
query string: name:Hypnotize
parsed query object:
{
"combineWith": "AND",
"queries": [
{
"fields": [
"name"
],
"queries": [
"hypnotize"
],
"combineWith": "OR"
}
]
}
search options:
{
"boost": {
"name": 5,
"author": 2
},
"prefix": true
}
For some reason, it also returns results where the author field matches the query. And even if I also specify author as well just in case, it still does this:
query string: name:Hypnotize author:Hypnotize
parsed query object:
{
"combineWith": "AND",
"queries": [
{
"fields": [
"name"
],
"queries": [
"hypnotize"
],
"combineWith": "OR"
},
{
"fields": [
"author"
],
"queries": [
"hypnotize"
],
"combineWith": "OR"
}
]
}
search options:
{
"boost": {
"name": 5,
"author": 2
},
"prefix": true
}
Thanks a lot for the detailed example. I think I know what’s going on, and it’s definitely something that can be fixed in MiniSearch.
The problem is that the current logic ends up including in the search all fields that are either in the fields
option or in the boost
option. That’s a bug, which I will fix as soon as I can work on it: the boost
option should have no effect on the considered fields
. Even though boosting fields that are not considered for search is useless, it should not interfere with the results.
In the meanwhile, if it’s acceptable to temporarily remove the boosts from the search options, you should see the issue disappearing. Alternatively, you could remove the boosting from the search options and add it to each query, but only for the considered fields.
Thanks again for reporting this, I will comment here once the issue is resolved.
Yep that workaround works! Thanks so much by the way, you have been exceedingly helpful!
Happy to help :) I just merged #200 , which fixes the issue you isolated (and another small related one I found in the same code path), and released version v6.0.1
on NPM. You should now be able to upgrade, and use the boost
option like you did in your examples above, without incurring in the issue.
Thanks again for the detailed issue report!
Wow, thank you so much!
I was wondering because it doesn't seem like it's working for me. I'm trying to implement search syntax within specific fields for the user. Wanted to isolate the problem.