Closed alex-au-922 closed 5 months ago
However, there could be cases that users would like to create their inner query dynamically, or for the sake readability that they would like a container for their other query types like FuzzyTermQuery and PhraseQuery.
I agree with you that parse_query
covers a lot of ground, using the tantivy query language. With my maintainer hat on, I see that as less code in tantivy-py, compared to adding extra explicit query types. However, the request does come up a fair bit and so I was wondering whether you could describe a specific use case here?
Sure, let's use a common ground for easier discussion, consider the following elasticsearch query:
{
"query": {
"bool": {
"must": [
{
"dis_max": {
"queries": [
{
"match": {
"title": {
"query": "sea whale",
"boost": 2
}
}
},
{
"match": {
"body": {
"query": "white dog",
"boost": 1.5
}
}
}
],
"tie_breaker": 0.3
}
}
]
}
}
}
The current parse_query
method is impossible to construct this query as the tantivy query language currently cannot parse other query types say regex or disjunction max queries. However, this functionality is available in Rust's BooleanQuery and PyLucene's equivalent method.
For tantivy-py
's case, we might consider the following function signature:
class Query:
...
@staticmethod
def boolean_query(subqueries: Iterator[tuple[Occur, Query]]) -> Query:
...
This requires the introduction of Occur enum
in tantivy
rust package.
The above elasticsearch syntax can be then transformed to:
Query.boolean_query(
[
(
Occur.MUST,
Query.dis_max_query(
[
Query.phrase_query("title", "sea whale", boost=2),
Query.phrase_query("body", "white dog", boost=1.5)
],
tie_breaker=0.3
)
)
]
)
which providers 3 benefits to developers:
Thanks for taking the time to write it out. You've explained it well 👍🏼
We are currently tracking progress on wrapping these query types in this comment in #20. I see BooleanQuery
is already there along with the disjunction max and the regex query.
Added pull request for the implementation #243
PR has been merged, thanks!
The existing boolean query feature could be done from the
index.parse_query
, as long as we type the correct characters like+
,-
formust
andmust_not
respectively.However, there could be cases that users would like to create their inner query dynamically, or for the sake readability that they would like a container for their other query types like
FuzzyTermQuery
andPhraseQuery
.Currently the rust
tantivy
package allows creating the boolean query from theStruct tantivy::query::BooleanQuery
. Willtantivy-py
also have theboolean_query
staticmethod for theQuery
class?