opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
143 stars 108 forks source link

[BUG] Indices that include knn_vector field fail when search includes "fields" parameter #1633

Open jmazanec15 opened 3 months ago

jmazanec15 commented 3 months ago

Description

Right now, for indices that contain a knn_vector, we are not able to support the fields option in queries. This is because in KNNVectorFieldType, we do not implement ValuesFetcher. We should implement it similar to the geo types that handle arrays directly.

curl -X GET "localhost:9200/target_index/_search?pretty" -H 'Content-Type: application/json' -d'
> {
>   "fields": ["*"],
>   "query": {
>     "match_all": {}
>   }
> }
> '

{
  "error" : {
    "root_cause" : [
      {
        "type" : "unsupported_operation_exception",
        "reason" : "KNN Vector do not support fields search"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "target_index",
        "node" : "grr7rDoPSwKXqQcpNrcbJQ",
        "reason" : {
          "type" : "unsupported_operation_exception",
          "reason" : "KNN Vector do not support fields search"
        }
      }
    ],
    "caused_by" : {
      "type" : "unsupported_operation_exception",
      "reason" : "KNN Vector do not support fields search",
      "caused_by" : {
        "type" : "unsupported_operation_exception",
        "reason" : "KNN Vector do not support fields search"
      }
    }
  },
  "status" : 500
}

Repro steps

curl -X PUT "localhost:9200/target_index" -H 'Content-Type: application/json' -d'
{
  "settings" : {
    "number_of_shards" : 1,
    "number_of_replicas" : 0,
    "index.knn": true
  },
  "mappings": {
       "properties": {
       "target_field": {
           "type": "knn_vector",
           "dimension": 2,
           "method": {
               "name":"hnsw",
               "engine":"nmslib",
               "space_type": "l2",
               "parameters":{
                 "m":20,
                 "ef_construction": 23
               }
           }
      }
   }
  }
}
'

curl -X PUT "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index": { "_index": "target_index" } }
{ "target_field": [1.5, 5.5]}
{ "index": { "_index": "target_index" } }
{ "target_field": [2.5, 3.5]}
{ "index": { "_index": "target_index" } }
{ "target_field": [4.5, 5.5]}
{ "index": { "_index": "target_index" } }
{ "target_field": [1.5, 5.5]}
{ "index": { "_index": "target_index" } }
{ "target_field": [1.5, 5.5]}
{ "index": { "_index": "target_index" } }
{ "target_field": [2.5, 3.5]}
'

curl localhost:9200/target_index/_refresh?pretty
curl -X GET "localhost:9200/target_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "fields": ["*"],
  "query": {
    "match_all": {}
  }
}
'
OliverLiebmann commented 2 months ago

This is not a bug, you are trying to use efficient knn filtering, but used the nmslib engine, which does not support it. If you change the engine to lucene or faiss it should work as expected.

Here is the documentation for it: https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/ image

jmazanec15 commented 2 months ago

@OliverLiebmann this is a bug - it doesnt have anything to do with filtering - more so use of fields in the search payload. See:

curl -X GET "localhost:9200/target_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "fields": ["*"],
  "query": {
    "match_all": {}
  }
}
'
jmazanec15 commented 2 months ago

See this comment as well: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/mapper/MappedFieldType.java#L128-L134

samuel-oci commented 1 month ago

@jmazanec15 feel free to assign to me I understand the required fix.

jmazanec15 commented 1 month ago

Thanks @samuel-oci ! Assigned

jmazanec15 commented 1 week ago

@samuel-oci are you taking a look at this one?