opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Fixing the bug when a segment has no vector field present for disk based vector search #2281

Closed navneet1v closed 2 days ago

navneet1v commented 2 days ago

Description

Fixing the bug when a segment has no vector field present for disk based vector search

The check will ensure that if there are segments with no vector field the disk based vector search is not crashing.

Whats the fix:

So I added couple of things in the code which will not only fix the bug but will also provide some speedup to the DiskAnn Queries in certain cases.

  1. When the rescore pass happens in disk based vector search, only those segments are hit which has docs to be rescored. Earlier all the segments were getting rescore call even when they don't have to rescore the docs. This will provide some speed up to the query and also fix the bug.

Dev Testing

Create Index

PUT my-knn-index-61
{
    "settings": {
        "index": {
            "knn": true,
            "number_of_shards": 1,
            "number_of_replicas": 0,
            "refresh_interval": "1s"
        }
    },
    "mappings": {
        "properties": {
            "my_vector1": {
                "type": "knn_vector",
                "dimension": 8,
                "mode": "on_disk",
                "compression_level": "32x"
            }
        }
    }
}

Ingest 2 documents

PUT _bulk?refresh=true
{ "index": { "_index": "my-knn-index-61", "_id": "17" } }
{ "my_vector1": [-6.78, 5.34, -8.12, 6.78, -4.12, 7.89, -3.45, 8.34] }
{ "index": { "_index": "my-knn-index-61", "_id": "74" } }
{ "my_vector1": [7.34, -6.45, 5.12, -7.78, 6.89, -4.34, 8.12, -5.67] }
{ "index": { "_index": "my-knn-index-61", "_id": "5" } }
{ "my_vector1": [-5.78, 7.12, -6.45, 8.34, -4.12, 7.89, -6.78, 5.34] }
{ "index": { "_index": "my-knn-index-61", "_id": "17644" } }
{ "my_vector1": [6.45, -8.34, 5.67, -7.89, 3.12, -6.78, 8.45, -4.12] }
{ "index": { "_index": "my-knn-index-61", "_id": "177322" } }
{ "my_vector1": [-7.12, 6.78, -4.56, 8.34, -5.67, 7.12, -3.34, 6.45] }

Delete a document

DELETE my-knn-index-61/_doc/17

Segments

GET _cat/segments/my-knn-index-61
[
    {
        "index": "my-knn-index-61",
        "shard": "0",
        "prirep": "p",
        "ip": "127.0.0.1",
        "segment": "_0",
        "generation": "0",
        "docs.count": "5",
        "docs.deleted": "0",
        "size": "4.3kb",
        "size.memory": "0",
        "committed": "false",
        "searchable": "true",
        "version": "9.12.0",
        "compound": "true"
    },
    {
        "index": "my-knn-index-61",
        "shard": "0",
        "prirep": "p",
        "ip": "127.0.0.1",
        "segment": "_1",
        "generation": "1",
        "docs.count": "0",
        "docs.deleted": "1",
        "size": "3.1kb",
        "size.memory": "0",
        "committed": "false",
        "searchable": "true",
        "version": "9.12.0",
        "compound": "true"
    }
]

Search Again with error

GET my-knn-index-61/_search
{
    "query":{
        "knn":{
            "my_vector1": {
                "vector": [1,1,1,1,1,1,1,1],
                "k": 10
            }
        }
    }
}

Response

{
    "took": 61,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 0.0031684334,
        "hits": [
            {
                "_index": "my-knn-index-61",
                "_id": "177",
                "_score": 0.0031684334,
                "_source": {
                    "my_vector1": [
                        -7.12,
                        6.78,
                        -4.56,
                        8.34,
                        -5.67,
                        7.12,
                        -3.34,
                        6.45
                    ]
                }
            },
            {
                "_index": "my-knn-index-61",
                "_id": "173",
                "_score": 0.0029043476,
                "_source": {
                    "my_vector1": [
                        -6.78,
                        5.34,
                        -8.12,
                        6.78,
                        -4.12,
                        7.89,
                        -3.45,
                        8.34
                    ]
                }
            },
            {
                "_index": "my-knn-index-61",
                "_id": "175",
                "_score": 0.002883079,
                "_source": {
                    "my_vector1": [
                        -5.78,
                        7.12,
                        -6.45,
                        8.34,
                        -4.12,
                        7.89,
                        -6.78,
                        5.34
                    ]
                }
            },
            {
                "_index": "my-knn-index-61",
                "_id": "174",
                "_score": 0.002864083,
                "_source": {
                    "my_vector1": [
                        7.34,
                        -6.45,
                        5.12,
                        -7.78,
                        6.89,
                        -4.34,
                        8.12,
                        -5.67
                    ]
                }
            },
            {
                "_index": "my-knn-index-61",
                "_id": "176",
                "_score": 0.0027358374,
                "_source": {
                    "my_vector1": [
                        6.45,
                        -8.34,
                        5.67,
                        -7.89,
                        4.12,
                        -6.78,
                        8.45,
                        -4.12
                    ]
                }
            }
        ]
    }
}

Related Issues

Ref: https://github.com/opensearch-project/k-NN/pull/2278

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v commented 2 days ago

Since this problem happened due to fieldinfo being null will raise a separate PR for fixing that so that in future we don't face this issue.