opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
152 stars 113 forks source link

Add changes for AVX-512 support in k-NN. #2110

Closed akashsha1 closed 1 week ago

akashsha1 commented 1 week ago

Description

This change adds support to speed up vector search and indexing in faiss using AVX512 hardware accelerator.

Related Issues

Resolves #2056

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

assanedi commented 1 week ago

Benchmark was run using opensearch-benchmark with cohere dataset(768 dimensions). Her are some configuration details for indexing: { "target_index_name": "target_index", "target_field_name": "target_field", "target_index_body": "indices/faiss-index.json", "target_index_primary_shards": 4, "target_index_replica_shards": 1, "target_index_dimension": 768, "target_index_space_type": "innerproduct", "target_index_bulk_size": 100, "target_index_bulk_index_data_set_format": "hdf5", "target_index_bulk_index_data_set_path": "/mnt/nvme1/documents-1m.hdf5", "target_index_bulk_indexing_clients": 20, "target_index_max_num_segments": 1, "hnsw_ef_search": 256, "hnsw_ef_construction": 256 }

Her are some configuration details for search: { "target_index_name": "target_index", "target_field_name": "target_field", "query_k": 100, "query_body": { "docvalue_fields" : ["_id"], "stored_fields" : "none" }, "query_data_set_format": "hdf5", "query_data_set_path": "/mnt/nvme1/queries-1m-100k.hdf5", "query_count": 30000, "search_clients": 20 }

A forcemerge to reduce the number of max_num_segments to 1 is executed via the API before the seach.

The opensearch cluster was deployed with 2 data nodes (r7i.2xlarges), 1 replica and 4 shards. Using this setup AVX512 shows 15% improvement over AVX2 on indexing and 7 % on search as shown below:

image

naveentatikonda commented 1 week ago

Benchmark was run using opensearch-benchmark with cohere dataset(768 dimensions). The opensearch cluster was deployed with 2 data nodes (r7i.2xlarges), 1 replica and 4 shards. Using this setup AVX512 shows 15% improvement over AVX2 on indexing and 7 % on search as shown below:

image

@assanedi Can you also pls add other configuration details like the indexing clients, query clients, ef_construction, ef_search, etc

naveentatikonda commented 1 week ago

"target_index_bulk_indexing_clients": 20, "target_index_max_num_segments": 10, "hnsw_ef_search": 256, "hnsw_ef_construction": 256

@assanedi Isn't the max_num_segments was 1 during forcemerge ?

assanedi commented 1 week ago

"target_index_bulk_indexing_clients": 20, "target_index_max_num_segments": 10, "hnsw_ef_search": 256, "hnsw_ef_construction": 256

@assanedi Isn't the max_num_segments was 1 during forcemerge ?

Yes I run the forcemerge API, here is the results of it: curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1 {"_shards":{"total":8,"successful":8,"failed":0}}

naveentatikonda commented 1 week ago

Yes I run the forcemerge API, here is the results of it: curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1 {"_shards":{"total":8,"successful":8,"failed":0}}

Yes, but in the configuration you mentioned it as 10 instead of 1 for target_index_max_num_segments

naveentatikonda commented 1 week ago

For FP32 we don’t need to make any changes in Faiss as they are using auto-vectorization to achieve the optimization with AVX512. But, for Scalar Quantization Intel have raised a PR to Faiss which is under review https://github.com/facebookresearch/faiss/pull/3853

assanedi commented 1 week ago

Yes I run the forcemerge API, here is the results of it: curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1 {"_shards":{"total":8,"successful":8,"failed":0}}

Yes, but in the configuration you mentioned it as 10 instead of 1 for target_index_max_num_segments

I updated the configuration details