opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
155 stars 114 forks source link

[FEATURE] Element access in KNNVectorScriptDocValues ? #2233

Open dhmw opened 6 days ago

dhmw commented 6 days ago

Is your feature request related to a problem?

The vector element access is not implemented here: https://github.com/opensearch-project/k-NN/blob/eb0a3c7454cb33346f135161beb21f46f43b8457/src/main/java/org/opensearch/knn/index/KNNVectorScriptDocValues.java#L70

Is there a good reason for this?

What solution would you like?

It would be nice to be able to access the vector values in scripts, for example in pre-filtering documents which do not meet a minimum vector element condition. This is useful when the vector represents classification of features, and we want to exclude documents from kNN search when a feature is present.

e.g.


GET vectors-idx/_search
{
  "query": {
    "bool": {
        "must": [
          {
            "knn": {
              "vectors.my-vector": {
                "vector": [ ... values ...  ],
                "k": 10
              }
            }
          }
        ],
        "filter": [
          {
            "script": {
              "script": {
                "source": "return doc['vectors.my-vector'][12] < 0.05"
              }
            }
          }
        ]
      }
  },
  "sort": [
    {
      "_score": "desc"
    }
  ],
  "size": 10
}

What alternatives have you considered?

Currently, we would have to scan and index additional fields on our documents and use a regular field range query, but this requires storing redundant information in the document.

navneet1v commented 5 days ago

Thanks for creating the GH issue. This will be a valuable enhancement to already present script score based search for embeddings