opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 115 forks source link

[FEATURE] Providing better experience for doing Exact Search without Script Score Query #1079

Open navneet1v opened 1 year ago

navneet1v commented 1 year ago

Is your feature request related to a problem? Currently if a user has to do exact search, they need to use script query to the exact search. Ref: https://opensearch.org/docs/latest/search-plugins/knn/knn-score-script/ This is not intuitive and uses an extra hop, which is script compilation. Given that scripts are notorious and posses security concerns running in OpenSearch in multi tenant environments.

What solution would you like? Solution is to provide the exact search feature in the k-nn query clause itself. Given that k-NN vectors are stored as doc values in the segment, during query execution code can easily iterate over these doc values of the segment to do the exact search.

What alternatives have you considered? NA

Do you have any additional context? NA

jmazanec15 commented 1 year ago

I like it. Something like this?

GET my-knn-index-1/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector2": {
        "vector": [2, 3, 5, 6],
        "k": 2,
        "exact": true
      }
    }
  }
}

then default exact to false for bwc?

navneet1v commented 1 year ago

@jmazanec15 this is inline to the thought I had when creating this issue. Do we see any other alternative here?

I see one, which is creating a new query clause but thats not a good option. So I never added it. So lets stick to this one unless some one has any better ideas.

@vamshin @heemin32 any thoughts you have here.

navneet1v commented 4 months ago

On thinking over this more I think, what we can do is we can start taking another option which is used for text fields too. We can use this field index(similar to store) in the field with value as false to indicate that whether to index vectors or not. Indexing vectors here means create KNN data structures or not.

Example:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 3,
        "index":false
      }
    }
  }
}

We can then pass this set the value as an attribute in VectorField, and read this value in PerFieldCodec to hit a new plain codec that just stores the vectors.

This approach will work for all the engines.

cc: @vamshin , @jmazanec15 , @luyuncheng what your thought.

jmazanec15 commented 4 months ago

I think that makes sense. So this would fall back to brute force knn search?

navneet1v commented 4 months ago

I think that makes sense. So this would fall back to brute force knn search?

Yes that is correct.

jmazanec15 commented 3 months ago

@navneet1v do you think instead we can do "method": false? Right now, method specifies the structure for ANN search. So, if we set to false, it would make sense that we do not want to do ANN search.

navneet1v commented 3 months ago

@navneet1v do you think instead we can do "method": false? Right now, method specifies the structure for ANN search. So, if we set to false, it would make sense that we do not want to do ANN search.

actually method false we can do but in Opensearch that way we define things are not indexed is by saying index: false. People can still do search using doc values. Similarly here users can do the search via VectorValues. I see that as more seamless experience.

Another thing is method:false doesn't work with LegacyFieldMapper

jmazanec15 commented 3 months ago

Got it. Then, I think we will need to ensure that method, model_id, and index:false, are all mutually exclusive

navneet1v commented 3 months ago

Yes that is correct. See index: false govern that should we be creating vector data structures or not. It has that simple job.