Closed sutsr closed 4 days ago
I think @martin-gaievski has more context knowledge on the hybrid query.
@sutsr the behavior you're seeing in 2.13 is expected, we have closed the possibility of having hybrid query nested/wrapped into some other query. The change has been introduced in 2.12, https://github.com/opensearch-project/neural-search/pull/498. Main reason for that was the current approach for running hybrid query, it does the normalization in form of post processing of all shard level results. Regular query does it at the shard level, and those scores will be changed by the hybrid query anyway.
Thanks for the clarification and explanation @martin-gaievski. Can you offer some guidance on how best to rework queries along the lines of my example queries?
Should a boolean query inside post_filter
be used to replace any queries previously combined as in my examples (using bool
wrapping hybrid
or queries inside request_processors
)? Essentially I am trying to provide result ranking from the hybrid query with knock-out exclusion on non-ranking-related fields.
e.g.
POST /hybrid_search_index1/_search
{
"query": {
"hybrid": {
"queries": [
{
"match": {
"text": {
"query": "family"
}
}
},
{
"neural": {
"embedding": {
"query_text": "family",
"model_id": “model_id_here“,
"k": 5
}
}
}
]
}
},
"search_pipeline": {
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "l2"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.7,
0.3
]
}
}
}
}
]
},
"post_filter": {
"bool": {
"filter": [
{
"range": {
"publish_date": {
"lte": "2024-01-1"
}
}
}
],
"must_not": [
{
"term": {
"is_flagged": true
}
}
]
}
},
}
@sutsr right, post filter should solve your use case, it does not considered to be a wrapper for the hybrid query. As long you don't need exact number of results and ok with post processing of the main query results post filters is a good choice.
What is the bug?
I have two query types that ran fine on OpenSearch 2.11, but after upgrading to 2.13, error with illegal argument exceptions of
hybrid query must be a top level query and cannot be wrapped into other queries
.Request processor used to combine a criterion with a hybrid search
Top level boolean used to combine an exclusion criterion with the results of a hybrid search
How can one reproduce the bug?
Attempt to combine a hybrid query with a search pipeline request processor or use it as part of a boolean query in the fashion shown by the example queries.
The index used for these queries has the following fields:
search_as_you_type
knn_vector
strict_year_month_day
boolean
What is the expected behavior?
In the first instance, I expect the hybrid query to run, only returning results that meet the additional criterion from the request processor.
In the second instance I expect the query to run and return results that meet both conditions.
What is your host/environment?
AWS Managed OpenSearch
Do you have any screenshots?
Not applicable
Do you have any additional context?
The model used for neural search is an external model calling out to OpenAI's ada-002 embedding API configured as per the blueprint