opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
142 stars 108 forks source link

[FEATURE] Enable setting ef_search parameter as part of k-NN Query #1537

Open navneet1v opened 4 months ago

navneet1v commented 4 months ago

Is your feature request related to a problem? While using HNSW algorithm currently there is no support for sending the ef_search parameter as part of query. The ask here is to support functionality of setting the ef_search in the k-NN query. Even though the downstream libraries like Faiss and Nmslib has support for this.

Faiss: https://github.com/facebookresearch/faiss/blob/dafdff110489db7587b169a0afee8470f220d295/faiss/impl/HNSW.h#L46

What solution would you like? While doing a k-NN query as a user I should be able to provide the ef_search value.

What alternatives have you considered? Currently in Nmslib I can update this value per index by updating the index settings. But this functionality is not there for faiss.

Do you have any additional context? NA

jmazanec15 commented 4 months ago

@navneet1v does nmslib support passing this at search time? I dont think it does.

jmazanec15 commented 4 months ago

Also, IVF would benefit from this as well for the nprobe parameter. The nprobe should be easier to do once we have https://github.com/opensearch-project/k-NN/pull/1527 checked in as well. We can then just say for the nprobe parameter passed in at training is just the default.

navneet1v commented 4 months ago

@navneet1v does nmslib support passing this at search time? I dont think it does.

The way it works is you need to unload the graphs and then load again with new values. I would say its not true query parameter for nmslib. But we should move towards making the change in nmslib to avoid making ef_search as a graph parameter.

shatejas commented 2 months ago

Picking this up

shatejas commented 2 months ago

[RFC] Adding efSearch as query parameter for kNN

Overview

The efSearch parameter is utilized in the HNSW (Hierarchical Navigable Small World) algorithm to facilitate the exploration of more neighboring candidates, which in turn enhances the potential recall. A higher value for efSearch can often result in improved recall.

Opensearch k-NN plugin offers three engines options for HNSW-based kNN: NMSLIB, FAISS and Lucene. Currently customers can specify the efSearch option for NMSLIB and FAISS implementations as an index setting. This setting is established during index creation process .

FAISS ef_search value cannot be changed after the index is created. NMSLIB value can be changed. In the case of the Lucene implementation, the efSearch parameter is currently tied to the k parameter.

The objective of the feature is to enable efSearch as a query parameter across all three engines. This documents outlines the challenges associated with the goal, along with potential solutions and recommendations.

API Contract changes

An additional parameter support will be added in the query ef_search which will take an integer value indicating number of neighbors to explore. The value will be valid if its greater than 0;

example query

{
  "query": {
    "knn": {
      "faiss_vector": {
        "vector": [2, 3, 5],
        "k": 2,
        "ef_search":5
      }
    }
  }
}

Engine support

Opensearch will maintain support for index settings for efSearch in FAISS and NMSLIB engines. The setting will serve as defaults. However, if the query time efSearch parameter is provided, it will take precedence over the index settings.

FAISS

The FAISS engine is the simplest among the three. We can forward the efSearch parameter to the FAISS engine to get the intended results.

NMSLIB

NMSLIB implementation lacks support for efSearch as a query parameter. Instead, It is stored as a member variable of the index, which makes the implementation not-thread-safe with respect to query time efSearch.

The k-NN plugin caches the native memory address when it loads the NMSLIB (and FAISS) index. A simplistic approach would be to load a separate index with a distinct efSearch value from the query. The efSearch value will be appended as a the cache key for the native memory address. However, this approach can potentially turn into a memory bottleneck and can lead to OOM issues for customers if large number of queries with different efSearch parameters are executed. The memory bottleneck will be particularly more evident if there are multiple indexes in a node.

Approach 1: Adding support for efSearch in NMSLIB [Recommended]

The approach entails adding support for efSearch query parameter in the NMSLIB implementation of HNSW. It mirrors FAISS’ implementation, where the efSearch query parameter value supersedes the default index setting. The implementation will in turn make NMSLIB thread safe for query time efSearch.

Given that KNNQuery is utilized for algorithms beyond HNSW in NMSLIB, we introduce HnswKNNQuery which inherits KNNQuery . This can accommodate HNSW specific parameters such as efSearch. Consequently efSearch value will be utilized

psuedo code:

extractEfSearch(KNNQuery* query) {
    if (//check to see if query type is HnswKNNQuery and if efsearch value is present in the query) {
       return query.efsearch
    }
    return ef_; //ef_ is a member variable in index
}

The discussion has been initiated in the this github feature request and is awaiting feedback.

Approach 2: Synchronize on Index Allocation Object

Currently the system permits multiple threads to query the same index simultaneously. This approach will block multiple threads from querying the same index. An index allocation object, stored in cache, which holds the native memory address of the index graph. By synchronizing the call to query method of the library on the index allocation object acts as a work around to the thread safety problem. Note that, in this approach, we will manipulate the member variable of efSearch from the query and set it back to the index setting. As a consequence, it can lead to contention in the system if large number of queries are executed at once.

It's also worth noting that implementing this approach impacts the FAISS' implementation. Therefore, a refactor is necessary to ensure that FAISS querying remains unaffected, making sure we don't block threads for FAISS'.

Pros and cons

Approach Pros Con
1. Adding support for efSearch in nmslib A simple change in NMSLIB
2. Synchronize on index Does not require changes in NMSLIB Requires querying to be single threaded for a particular index, making it a potential bottleneck

Lucene

The Lucene implementation currently uses k as the equivalent of efSearch. It does not have a mechanism to differentiate between the k and efSearch parameters.

Lucene extracts the top similar documents for KNN by overriding the rewrite phase of the query. The rewrite of the query is managed in the class AbstractKnnVectorQuery . As a part of rewrite, AbstractKnnVectorQuery iterates over each segment in parallel and retrieves the top k docs, exploring k neighbors as candidates from each segment. It utilizes TopDocsKnnCollector to collect results from each segment, subsequently merging the top k documents from the segment results.

Approach 1: Add support for efSearch parameter in Lucene [Recommended]

The approach requires introducing a parameter efSearch, alongside k in AbstractKnnVectorQuery . This parameter then needs to passed on to TopDocsKnnCollector while its being initialized. This makes sure that the underlying HNSW implementation utilizes efSearch to explore the candidates similarities. Meanwhile AbstractKnnVectorQuery continues to utilize the k value to return the results. The k value will be used to merge the top efSearch docs for each segment.

The approach requires us to submit a pull request which add efSearch as a parameter in lucene. A previous attempt was made for it in this pull request for a previous implementation of HNSW. The pull request was closed in favor of parallel work to optimize multi-segment HNSW algorithm.

Approach 2: Add custom queries

This introduces four new queries which will inherit KnnVectorFloatQuery , KnnVectorByteQuery , DiversifyingChildrenFloatKnnVectorQuery and DiversifyingChildrenFloatKnnVectorQuery . It introduces additional member variable for efSearch and override the implementation to merge top k documents.

psuedo code example

public class EfSearchKnnVectorFloatQuery extends KnnVectorFloatQuery {
    private int final efSearch;
    private int final k;

    public EfSearchKnnVectorFloatQuery(String field, float[] target, 
        int k, int efSearch, Query filter) {

        super(field, efSearch, filter);
        this.topk = k;
    }

    protected TopDocs mergeLeafResults(TopDocs[] perLeafResults) {
        return TopDocs.merge(this.k, perLeafResults);
    } 
}

Pros and cons

Approach Pros Con
1. Add support for efSearch parameter in lucene Built in support
2: Add custom queries Code not dependent on lucene support Creates an additional hierarchy to work around lucene implementation. Possibly added work in future if lucene introduces new query types

efSearch Bounds

While efSearch value can be used to get a better recall, the cost can be higher latencies if the parameter value is recklessly large and can turn into a literal k-NN instead of approximate, especially if the value of efSearch is not adjusted per segment. The current bound for k is 10000 in opensearch k-NN plugin, similarly a recommendation is to put a upper bound on the efSearch value, the upper bound value needs to be decided based on testing.

Note: After a discussion we deceide not to go ahead with bounds as there are no bounds on current ef_search values while indexing


Appendix

OOM issues calculations

Assumptions

Memory: 128 GB out of which 32GB is used by heap and 90GB as native memory 10 million float vectors in a shard with 1024 dimensions. with 1 index Size of vectors: 4 bytes (float) 1024 10 million = 40GB. considering each query has a different efSearch, it can handle only 2 requests with different efSearch values before the node runs into out of memory issue


Lucene update

After a discussion, we concluded that we will use max(efsearch, k) to get shard level results and let size attribute in the query dictate the final set of results being returned in the response

navneet1v commented 2 months ago

@shatejas thanks for putting up the RFC.

For this feature I have created this feature branch: https://github.com/opensearch-project/k-NN/tree/feature/ef-search. The branch cut out of main branch just now.

junqiu-lei commented 1 month ago

For radial search in Lucene engine HNSW, there is no the ef_search parameter being used, the only similar parameter is traversal_threshold, but the value meaning is different from ef_search, which makes it impossible to transfer. At this initial release we plan to block the ef_search in lucene radial search request. At some point in the future we can think about exposing the traversal_threshold api in lucene radial search.

navneet1v commented 1 month ago

For radial search in Lucene engine HNSW, there is no the ef_search parameter being used, the only similar parameter is traversal_threshold, but the value meaning is different from ef_search, which makes it impossible to transfer. At this initial release we plan to block the ef_search in lucene radial search request. At some point in the future we can think about exposing the traversal_threshold api in lucene radial search.

Thanks @junqiu-lei for doing the deep-dive and providing the information.