opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.
https://opendistro.github.io/
Apache License 2.0
277 stars 56 forks source link

indexed vector not always included in the search results #152

Closed marti-1 closed 4 years ago

marti-1 commented 4 years ago

Hi,

I have the following index definition:

{
  "settings": {
    "index": {
      "knn": true,
      "knn.space_type": "l2"
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "hist_hash": {
          "type": "knn_vector",
          "dimension": 144
        },
        "filename": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}

When searching with a vector that is already indexed, the results don't always include item with that vector. Is this supposed to happen?

Thanks!

vamshin commented 4 years ago

HI @marti-1,

Could you please share an example docs with vector and query output?

vamshin commented 4 years ago

Possibly because of this https://github.com/opendistro-for-elasticsearch/k-NN/issues/154

marti-1 commented 4 years ago

Hi @vamshin, I have read couple of papers about the HNSW algorithm and from what I understood it has local optimas, so technically the document being search for might not be encountered, right?

jmazanec15 commented 4 years ago

Hi @marti-1, because the HNSW algorithm returns the approximate k-NN of a query it is possible that the document being searched will not be included in the search results. However, this should not happen very frequently.

What is the environment you are running the k-NN plugin in (Docker, RPM, DEB, AMI)? If it is RPM, DEB or AMI, I am guessing it is related to #154. We solved this issue for ODFE 1.9 which will be released soon. Additionally, we are planning on patching 1.8 with this fix.

jmazanec15 commented 4 years ago

My mistake @marti-1 , it will impact Docker as well.

marti-1 commented 4 years ago

Hi @jmazanec15, sorry for late reply. I have manually installed using Debian packages (https://opendistro.github.io/for-elasticsearch-docs/docs/install/deb/). The following is elastic-oss build information:

{
  "name" : "ubuntu1804.localdomain",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "bc93xKtmQrKaeQCZGltKvA",
  "version" : {
    "number" : "7.7.0",
    "build_flavor" : "oss",
    "build_type" : "deb",
    "build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
    "build_date" : "2020-05-12T02:01:37.602180Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Output from curl -XGET https://localhost:9200/_cat/plugins?v -u admin:admin --insecure:

name                   component                       version
ubuntu1804.localdomain opendistro-anomaly-detection    1.8.0.0
ubuntu1804.localdomain opendistro-job-scheduler        1.8.0.0
ubuntu1804.localdomain opendistro-knn                  1.8.0.0
ubuntu1804.localdomain opendistro_alerting             1.8.0.0
ubuntu1804.localdomain opendistro_index_management     1.8.0.0
ubuntu1804.localdomain opendistro_performance_analyzer 1.8.0.0
ubuntu1804.localdomain opendistro_security             1.8.0.0
ubuntu1804.localdomain opendistro_sql                  1.8.0.0

From your previous reply it looks like I need to upgrade version of the plugin to 1.9?

jmazanec15 commented 4 years ago

Hi @marti-1 , sorry for delay, yes upgrade to 1.9 should resolve.

Closing. For more discussion on this, please refer to #154