Open krishy91 opened 1 month ago
Although this issue might have to resolved directly on NestedHelper, I wanted to know the others opinion on this issue and how to go about it. It affects the knn search & hence the Neural Search (for nested documents) directly.
@heemin32 could you take a look at this?
I am also finding that must_not
does not work.
Create index
PUT /knn
{
"settings": {
"index": {
"knn": true,
"knn.algo_param.ef_search": 100
}
},
"mappings": {
"properties": {
"nested_field": {
"type": "nested",
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 3,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "faiss",
"parameters": {
"ef_construction": 128,
"m": 24
}
}
}
}
}
}
}
}
Index documents
PUT /_bulk?refresh=true
{ "index": { "_index": "knn", "_id": "1" } }
{"nested_field":[{"my_vector1":[1,1,1]},{"my_vector1":[2,2,2]},{"my_vector1":[3,3,3]}], "parking": "false"}
{ "index": { "_index": "knn", "_id": "2" } }
{"nested_field":[{"my_vector1":[10,10,10]},{"my_vector1":[11,11,11]},{"my_vector1":[12,12,12]}], "parking": "true"}
{ "index": { "_index": "knn", "_id": "3" } }
{"nested_field":[{"my_vector1":[1,1,1], "parking": "false"},{"my_vector1":[2,2,2]},{"my_vector1":[3,3,3]}]}
{ "index": { "_index": "knn", "_id": "4" } }
{"nested_field":[{"my_vector1":[10,10,10], "parking": "true"},{"my_vector1":[11,11,11]},{"my_vector1":[12,12,12]}]}
Query using must_not
GET knn/_search
{
"query": {
"nested": {
"path": "nested_field",
"query": {
"knn": {
"nested_field.my_vector1": {
"vector": [
1,
1,
1
],
"k": 2,
"filter": {
"bool": {
"must_not": [
{
"term": {
"parking": "false"
}
}
]
}
}
}
}
}
}
}
}
Should exclude id 1 but it does not.
What is the bug?
When a document contains vectors in nested documents, and we perform a nested knn query with filters set on the parent documents fields, the filters can only be specific Query types (like TermQuery). If for example a Phrase Query is specified as a filter, the knn query fails to retrieve any results at all. There are several other Query types (like exists, range etc) which also fail.
How can one reproduce the bug? Steps to reproduce the behavior:
What is the expected behavior?
Evene when filters specify PhraseQuery or range query etc. the filters should be applied & results should be returned if any.
What is your host/environment?
Do you have any additional context?
On analysis, we found that @navneet1v added the functionality to support applying filters on parent documents here: https://github.com/opensearch-project/k-NN/issues/1356
The code uses the NestedHelper.mightMatchNestedDocs method determine whether to filter is applied on Parent doucment or nested document. Unfortunately, mightMatchNestedDocs method checks for speicifc Query types individually to see if they contain "field" & check if it is present in the parent or the nested doc. This list of Query types in not complete. Many commonly uses Query types which have "field" are missing like Phrase query, Range query etc.
https://github.com/opensearch-project/OpenSearch/blob/f1c98a4da0cf6583212eecc9ed8ebc3cd426a918/server/src/main/java/org/opensearch/index/search/NestedHelper.java#L65