opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.
https://opendistro.github.io/
Apache License 2.0
277 stars 55 forks source link

KNNVectorFieldMapper unable to get algorithm params from settings when built from merge #288

Closed jmazanec15 closed 3 years ago

jmazanec15 commented 3 years ago

When the KNNVectorFieldMapper is built from a mapper merge, the mapper is unable to read the algorithm parameters that are stored as index settings. Because of this, it falls back to default settings. Because spaceType is one of the parameters, this can lead to incorrect graphs being built for an index. For example, a user may create a cosinesimil index and the graphs will be built using the l2 case.

Relevant lines of code: 1 https://github.com/opendistro-for-elasticsearch/k-NN/blob/master/src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNVectorFieldMapper.java#L124-L163

  1. https://github.com/opendistro-for-elasticsearch/k-NN/blob/master/src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNVectorFieldMapper.java#L349
  2. https://github.com/elastic/elasticsearch/blob/7.10/server/src/main/java/org/elasticsearch/index/mapper/ParametrizedFieldMapper.java#L104

For basic use cases where the mapping does not change after index creation, this bug will not make any impact. However, this bug may impact indices with dynamic mappings.

Reproduction steps:

curl -X PUT "localhost:9200/myindex" -H 'Content-Type: application/json' -d'
{
  "settings" : {
      "index": {
        "knn": "true",
        "knn.space_type": "cosinesimil"
      }
  },
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 2
      },
      "metadata" : {
        "type" : "object",
        "dynamic" : true
      }
    }
  }
}
'

curl -X POST "localhost:9200/myindex/_doc" -H 'Content-Type: application/json' -d'
{
  "metadata": {"test": "object"},
  "embedding":  [1, 2]
}
'

curl -X POST "localhost:9200/myindex/_search" -H 'Content-Type: application/json' -d'
{
  "size" : 1,
  "query": {
    "knn": {
      "embedding": {
        "vector": [3, 4],
        "k": 1
      }
    }
  }
}
'

This change should resolve the problem. In the future, we should switch these algorithm parameters from index settings to mapping parameters (tracking this in #282).

Impacted ODFE Version:

  1. 1.12
  2. 1.11
  3. 1.10

Related issues:

  1. 282

  2. 287

  3. https://discuss.opendistrocommunity.dev/t/cosine-similarity-formula/4390/12