Open iyoung opened 1 week ago
@iyoung thank you for informing us about this scenario. I need to request some additional information from you: what is the mapping for your index, what's the index configuration (number of nodes, shards prime and replicas), how many documents do you have, do you expect that query that is failing return search hits, if yes then approximately how many of them.
I have tried following scenario, it works fine on my side:
create index with knn vector field
PUT /index-test
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"vector": {
"type": "knn_vector",
"dimension": 3,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "lucene"
}
},
"field1": {
"type": "integer"
},
"name": {
"type": "text"
}
}
}
}
ingest several documents with vectors and text fields:
POST /index-test/_bulk?refresh
{"index":{}}
{"field1": 2,"vector": [0.4, 0.5, 0.2],"title": "basic", "name": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .", "category": "novel", "price": 20}
{"index":{}}
{ "name": "I brought home the trophy", "category": "story", "price": 20, "field1": 10,"vector": [0.2, 0.2, 0.3],"title": "java"}
{"index":{}}
{"field1": 50,"vector": [4.2, 5.5, 8.9],"name": "Why would he go to all that effort for a free pack of ranch dressing?", "category": "story", "price": 10 }
{"index":{}}
{"vector": [0.3, 0.12, 3.3],"title": "python","name": "In the next 40-50 years I plan on opening up my own business.","category": "poem","price": 100}
{"index":{}}
{ "field1": 100,"vector": [0.2, 0.2, 0.3],"title": "java", "name": "Does he have a big family?", "category": "biography", "price": 70}
{"index":{}}
{"name": "She is my younger sister","category": "workbook","price": 25}
run search with hybrid query
GET /index-test/_search
{
"size": 50,
"track_total_hits": true,
"query": {
"hybrid": {
"queries": [
{
"knn": {
"vector": {
"vector": [
0.15,
0.3,
1.1
],
"min_score": 0.2
}
}
},
{
"query_string": {
"fields": [
"title^2",
"name^3"
],
"query": "\"small\"",
"default_operator": "AND"
}
}
]
}
},
"post_filter": {
"bool": {
"must": [
{
"exists": {
"field": "vector"
}
}
]
}
},
"search_pipeline": {
"description": "Inline post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.5,
0.5
]
}
}
}
}
]
}
}
I tried multiple search words and different values of min_score
Same for my query after 2.13 > 2.15 upgrade. Exactly as described by @iyoung and also very similar to description https://github.com/opensearch-project/neural-search/issues/497
Especially this:
Honestly, it's very hard to reproduce the bug. It only happens for Hybrid search. I observe a pattern that queries with more than one word tend to be more likely to have this error than simple queries. Queries that failed are like "horror movies", "teen mom", "news radio". I also observed that when I changed the index data, some queries started working, and other queries started failing.
Issue happens randomly and it is possible to reproduce only for several minutes/hours. I cannot reproduce it for totally the same query after (probably, index data changes affect this).
2 nodes, 1 primary shard, 1 replica shard, ~600k documents (~13GB), hnsw
, faiss
Query is mostly the same as topic starter query but with three subqueries (text search + 2 knn). And also no min_score
for knn (because it doesn't exist in 2.13), instead knn queris in subqueries are wrapped by function_score
with own min_score
.
Any ideas?
@iyoung thank you for informing us about this scenario. I need to request some additional information from you: what is the mapping for your index, what's the index configuration (number of nodes, shards prime and replicas), how many documents do you have, do you expect that query that is failing return search hits, if yes then approximately how many of them.
I have tried following scenario, it works fine on my side:
- create index with knn vector field
PUT /index-test { "settings": { "index": { "knn": true } }, "mappings": { "properties": { "vector": { "type": "knn_vector", "dimension": 3, "method": { "name": "hnsw", "space_type": "l2", "engine": "lucene" } }, "field1": { "type": "integer" }, "name": { "type": "text" } } } }
- ingest several documents with vectors and text fields:
POST /index-test/_bulk?refresh {"index":{}} {"field1": 2,"vector": [0.4, 0.5, 0.2],"title": "basic", "name": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .", "category": "novel", "price": 20} {"index":{}} { "name": "I brought home the trophy", "category": "story", "price": 20, "field1": 10,"vector": [0.2, 0.2, 0.3],"title": "java"} {"index":{}} {"field1": 50,"vector": [4.2, 5.5, 8.9],"name": "Why would he go to all that effort for a free pack of ranch dressing?", "category": "story", "price": 10 } {"index":{}} {"vector": [0.3, 0.12, 3.3],"title": "python","name": "In the next 40-50 years I plan on opening up my own business.","category": "poem","price": 100} {"index":{}} { "field1": 100,"vector": [0.2, 0.2, 0.3],"title": "java", "name": "Does he have a big family?", "category": "biography", "price": 70} {"index":{}} {"name": "She is my younger sister","category": "workbook","price": 25}
- run search with hybrid query
GET /index-test/_search { "size": 50, "track_total_hits": true, "query": { "hybrid": { "queries": [ { "knn": { "vector": { "vector": [ 0.15, 0.3, 1.1 ], "min_score": 0.2 } } }, { "query_string": { "fields": [ "title^2", "name^3" ], "query": "\"small\"", "default_operator": "AND" } } ] } }, "post_filter": { "bool": { "must": [ { "exists": { "field": "vector" } } ] } }, "search_pipeline": { "description": "Inline post processor for hybrid search", "phase_results_processors": [ { "normalization-processor": { "normalization": { "technique": "min_max" }, "combination": { "technique": "arithmetic_mean", "parameters": { "weights": [ 0.5, 0.5 ] } } } } ] } }
I tried multiple search words and different values of
min_score
Thank you, I am in contact with the Opensearch managed service team within AWS about this issue and looking to replicate this with a smaller index. The index we're using has around 2m documents in it. Once I have a more concrete way to reproduce this I will update. Thank you for replying.
What is the bug?
Running a hybrid query which contains a min score for the vector side below 0.5 and providing a query_text form lexical search for certain searches (possibly related to number of matches) results in the following response:
How can one reproduce the bug?
This is the structure of query I am using which always throws the exception.
What is the expected behaviour?
Search results returned
What is your host/environment?
AWS managed Opensearch 2.15
Do you have any additional context?
Increasing the radial search threshold by increasing the min score on the vector search to 0.5 to 1 stops this, also by changing the query term or fields included and therefore the matched items also avoids this.
There are numerous exact phrase search terms which cause this issue for us, such as "group hiking" and "cold weather"
In the error logs I get: -