unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
1.92k stars 109 forks source link

Bug: On the SIFT1M test set, the speed of Usearch is much slower than HNSWlib #439

Open rjzhb opened 4 weeks ago

rjzhb commented 4 weeks ago

Describe the bug

On the SIFT1M test set, the speed of Usearch is much slower than HNSWlib, and there is not much difference between hnswlib and faiss.

Steps to reproduce

benchmark dataset: SIFT1M language: C++ comparision: Usearch vs hnswlib ef_construction: 200 M: 16 ef_search: 200~800(step:100)

result: https://github-production-user-asset-6210df.s3.amazonaws.com/105226542/335069356-ca67b231-ae7c-41ba-919f-5ddea85cde38.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240604%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240604T061724Z&X-Amz-Expires=300&X-Amz-Signature=a08e16c9618bb222c8a863ff18b91cae130662095895f9a65eccdc1dd64fb99d&X-Amz-SignedHeaders=host&actor_id=105226542&key_id=0&repo_id=725587775

Expected behavior

Usearch is nearly 10 times faster than HNSWLIB under the same recall.

I don't know if there are any special configurations, but I mainly use this method when using usearch:

unum::usearch::index_dense_config_t config;
config.expansion_add = ef_construction;
config.connectivity_base = M;
unum::usearch::metric_punned_t
Metric (dimension, unum::usearch::metric_kind_t::l2sq_k, unum::usearch::scalar_kind_t::f32_k);
Unum:: usearch:: index_dense-t usearch_index=Unum:: usearch:: index_dense-t:: make (metric, config);

Mainly using the following two interfaces

Usearch_index.add(......);

Usearch_index.save(......);

Then use these two interfaces to test

Usearch_index.change_expansion_search (......);
Auto result=usearch_index.search (......);

USearch version

v2.12.0

Operating System

Ubuntu 22.04

Hardware architecture

x86

Which interface are you using?

C++ implementation

Contact Details

rj986215159@outlook.com

Are you open to being tagged as a contributor?

Is there an existing issue for this?

Code of Conduct

ashvardanian commented 4 weeks ago

Hi! I don't expect the performance to be great for 1M points, at that scale you may not want to use indexing at all. But for clarity, are you running on 1 thread? What kind of machine are you using?

rjzhb commented 4 weeks ago

Hi! I don't expect the performance to be great for 1M points, at that scale you may not want to use indexing at all. But for clarity, are you running on 1 thread? What kind of machine are you using?

CPU: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz Memory: 16.0 GiB OS: Ubuntu 22.04 LTS

I used single thread. Compared to single threading, I think multi threading has lower latency, but QPS will decrease

And I think it's unreasonable to say that 1 million is not enough. For HNSW, even with 1 million, the number of calculations is already several thousand.

ashvardanian commented 4 weeks ago

Did you use 1 thread in both cases, for hnswlib and usearch? Did you use identical expansion factors for both engines? Sadly, the link seems broken and I can’t see what’s there.

rjzhb commented 3 weeks ago

Did you use 1 thread in both cases, for hnswlib and usearch? Did you use identical expansion factors for both engines? Sadly, the link seems broken and I can’t see what’s there.您是否在这两种情况下都使用了 1 个线程,用于 hnswlib 和 usearch?您是否为两个引擎使用了相同的膨胀系数?可悲的是,链接似乎断开了,我看不到那里有什么。

Of course, you can see my code:

#include "base_profiler.h"
#include "helper.h"
#include "index_dense.hpp"
#include "index_plugins.hpp"
#include <fstream>
#include <iostream>
#include <cstdlib>
#include <iomanip>
#include <unordered_set>

static const char *sift1M_base = "/benchmark_dataset/sift1M/sift_base.fvecs";
static const char *sift1M_query = "/benchmark_dataset/sift1M/sift_query.fvecs";
static const char *sift1M_groundtruth = "/benchmark_dataset/sift1M/sift_groundtruth.ivecs";

static const char *deep10M_base = "/benchmark_dataset/deep10M/deep10M_base.fvecs";
static const char *deep10M_query = "/benchmark_dataset/deep10M/deep10M_query.fvecs";
static const char *deep10M_groundtruth = "/benchmark_dataset/deep10M/deep10M_groundtruth.ivecs";

static const char *hnsw_index_l2_name_prefix = "/benchmark_save_index/usearch_index/";
static const char *hnsw_index_l2_name_suffix = "/usearch_index.save";

template<typename T>
void output_recall_top_k(const unsigned top_k, std::vector<std::vector<unum::usearch::index_dense_t::vector_key_t>> &results_pq,
                         const std::vector<std::unordered_set<T>> &groundtruth_1,
                         const std::vector<std::unordered_set<T>> &groundtruth_10,
                         const std::vector<std::unordered_set<T>> &groundtruth_100) {
  if (top_k != 100) {
    std::cerr << "top_k can only be 100" << std::endl;
    exit(-1);
  }
  size_t gt_cnt_1 = 0, gt_cnt_10 = 0, gt_cnt_100 = 0;
  std::vector<std::vector<T>> results(results_pq.size());
  for (unsigned i = 0; i < results_pq.size(); i++) {
    for (unsigned j = 0; j < results_pq[i].size(); j++) {
      results[i].push_back(results_pq[i][j]);
    }
  }
  for (unsigned i = 0; i < results.size(); i++) {
    for (size_t cnt = 0; cnt < 100; ++cnt) {
      auto id = results[i][cnt];
      if (cnt < 1 && groundtruth_1[i].contains(id)) {
        ++gt_cnt_1;
      }
      if (cnt < 10 && groundtruth_10[i].contains(id)) {
        ++gt_cnt_10;
      }
      if (cnt < 100 && groundtruth_100[i].contains(id)) {
        ++gt_cnt_100;
      }
    }
  }
  std::cout << "R@1:   " << std::fixed << std::setprecision(3) << float(gt_cnt_1) / float(results.size() * 1)
            << std::endl;
  std::cout << "R@10:  " << std::fixed << std::setprecision(3) << float(gt_cnt_10) / float(results.size() * 10)
            << std::endl;
  std::cout << "R@100: " << std::fixed << std::setprecision(3) << float(gt_cnt_100) / float(results.size() * 100)
            << std::endl;
}

template<typename T>
std::unique_ptr<T[]> load_data(const std::string &filename, size_t &num,
                               int &dim) {  // load data with sift10K pattern
  std::ifstream in(filename, std::ios::binary);
  if (!in.is_open()) {
    std::cout << "open file error" << std::endl;
    exit(-1);
  }
  in.read((char *)&dim, 4);
  in.seekg(0, std::ios::end);
  auto ss = in.tellg();
  num = ((size_t)ss) / (dim + 1) / 4;
  auto data = std::make_unique<T[]>(num * dim);

  in.seekg(0, std::ios::beg);
  for (size_t i = 0; i < num; i++) {
    in.seekg(4, std::ios::cur);
    in.read((char *)(data.get() + i * dim), dim * 4);
  }
  in.close();
  return data;
}

int main() {
  const char *homeDir = getenv("HOME");
  std::string base_file = homeDir;
  std::string query_file = homeDir;
  std::string groundtruth_file = homeDir;
  std::string data_type;
  size_t dimension = {};
  size_t base_counts = {};
  size_t M = 16;
  size_t ef_construction = 200;
  int ef_search_begin = 100;
  int ef_search_end = 800;
  int ef_search_step = 100;
  // let user choose to test sift1M or deep10M
  enum test_type {
    invalid = 0, sift1M = 1, deep10M = 2
  };
  std::cout << "please choose to test sift1M or deep10M, input 1 or 2" << std::endl;
  std::cout << "1. sift1M" << std::endl;
  std::cout << "2. deep10M" << std::endl;
  int choose = 0;
  std::cin >> choose;
  switch (choose) {
    case test_type::sift1M: {
      data_type = "sift1M";
      dimension = 128;
      base_counts = 1'000'000;
      base_file += sift1M_base;
      query_file += sift1M_query;
      groundtruth_file += sift1M_groundtruth;
      break;
    }
    case test_type::deep10M: {
      data_type = "deep10M";
      dimension = 96;
      base_counts = 10'000'000;
      base_file += deep10M_base;
      query_file += deep10M_query;
      groundtruth_file += deep10M_groundtruth;
      break;
    }
    default:std::cout << "invalid input, exit." << std::endl;
      return -1;
  }
  // let user choose M, ef_construction, ef_search
  std::cout
      << "please input M, ef_construction, ef_search_begin, ef_search_end, ef_search_step. input 0 to use default values"
      << std::endl;
  int M_input = 0, ef_construction_input = 0, ef_search_begin_input = 0, ef_search_end_input = 0,
      ef_search_step_input = 0;
  std::cin >> M_input >> ef_construction_input >> ef_search_begin_input >> ef_search_end_input
           >> ef_search_step_input;
  if (M_input > 0) {
    M = M_input;
  }
  if (ef_construction_input > 0) {
    ef_construction = ef_construction_input;
  }
  if (ef_search_begin_input > 0) {
    ef_search_begin = ef_search_begin_input;
  }
  if (ef_search_end_input > 0) {
    ef_search_end = ef_search_end_input;
  }
  if (ef_search_step_input > 0) {
    ef_search_step = ef_search_step_input;
  }
  std::string build_parameter = ".M." + std::to_string(M) + ".ef_construction." + std::to_string(ef_construction);
  std::string hnsw_index_l2_name = std::string(homeDir) + std::string(hnsw_index_l2_name_prefix) + data_type +
      std::string(hnsw_index_l2_name_suffix) + build_parameter;

  unum::usearch::index_dense_config_t config;
  config.expansion_add = ef_construction;
  config.connectivity_base = M;
  unum::usearch::metric_punned_t
      metric(dimension, unum::usearch::metric_kind_t::l2sq_k, unum::usearch::scalar_kind_t::f32_k);
  unum::usearch::index_dense_t usearch_index = unum::usearch::index_dense_t::make(metric, config);

  std::ifstream f(hnsw_index_l2_name);
  if (f.good()) {
    // Found index file
    std::cout << "Found index file ... " << std::endl;
    auto result = usearch_index.load(hnsw_index_l2_name.c_str());
    if (!result) {
      std::cout << "Failed to load index. Error." << "\n";
      exit(1);
    }
  } else {
    BaseProfiler profiler;
    {
      size_t embedding_count;
      std::unique_ptr<float[]> input_embeddings;
      {
        int dim;
        BaseProfiler profiler_in;
        profiler_in.Begin();
        input_embeddings = load_data<float>(base_file, embedding_count, dim);
        profiler_in.End();
        assert(dimension == dim || !"embedding dimension wrong");
        assert(base_counts == embedding_count || !"embedding count wrong");
        std::cout << "Load base data: " << profiler_in.ElapsedToString() << std::endl;
        //output embedding_count, dimension
        std::cout << "embedding_count: " << embedding_count << std::endl;
        std::cout << "dimension: " << dim << std::endl;
      }
      usearch_index.reserve(embedding_count);

      profiler.Begin();
      // insert data into index
      for (size_t idx = 0; idx < embedding_count; ++idx) {
        usearch_index.add(idx, input_embeddings.get() + idx * dimension);
        if (idx % 10000 == 0) {
          std::cout << idx << ", " << get_current_rss() / (1024 * 1024) << " MB, "
                    << profiler.ElapsedToString()
                    << std::endl;
        }
      }
      profiler.End();
      std::cout << "before clearing input vector, memory cost: "
                << get_current_rss() / (1024 * 1024) << " MiB" << std::endl;
    }
    std::cout << "Insert data cost: " << profiler.ElapsedToString() << " memory cost: "
              << get_current_rss() / (1024 * 1024) << " MiB" << std::endl;
    usearch_index.save(hnsw_index_l2_name.c_str());
  }

  size_t number_of_queries;
  std::unique_ptr<float[]> queries;
  {
    int dim;
    BaseProfiler profiler;
    profiler.Begin();
    queries = load_data<float>(query_file, number_of_queries, dim);
    profiler.End();
    assert(dimension == dim || !"query does not have same dimension as train set");
    std::cout << "Load query data: " << profiler.ElapsedToString() << std::endl;
  }

  int top_k;                                           // nb of results per query in the GT
  std::vector<std::unordered_set<int>> ground_truth_sets_1, ground_truth_sets_10,
      ground_truth_sets_100; // number_of_queries * top_k matrix of ground-truth nearest-neighbors
  {
    // load ground-truth and convert int to long
    size_t nq2;
    BaseProfiler profiler;
    profiler.Begin();

    auto gt_int = load_data<int>(groundtruth_file, nq2, top_k);
    assert(nq2 == number_of_queries || !"incorrect nb of ground truth entries");

    ground_truth_sets_1.resize(number_of_queries);
    ground_truth_sets_10.resize(number_of_queries);
    ground_truth_sets_100.resize(number_of_queries);
    for (size_t i = 0; i < number_of_queries; ++i) {
      for (int j = 0; j < top_k; ++j) {
        auto gt = gt_int[i * top_k + j];
        if (j < 1) {
          ground_truth_sets_1[i].insert(gt);
        }
        if (j < 10) {
          ground_truth_sets_10[i].insert(gt);
        }
        if (j < 100) {
          ground_truth_sets_100[i].insert(gt);
        }
      }
    }
    profiler.End();
    std::cout << "Load ground truth: " << profiler.ElapsedToString() << std::endl;
  }
  for (int ef = ef_search_begin; ef <= ef_search_end; ef += ef_search_step) {
    usearch_index.change_expansion_search(ef);
    BaseProfiler profiler;
    profiler.Begin();
    auto t0 = std::chrono::high_resolution_clock::now();
    std::vector<std::vector<unum::usearch::index_dense_t::vector_key_t>> results;
    results.resize(number_of_queries);
    for (size_t id = 0; id < number_of_queries; ++id) {
      auto result = usearch_index.search(queries.get() + id * dimension, top_k);
      results[id].resize(top_k);
      result.dump_to(results[id].data());
    }
    auto t_val = std::chrono::duration_cast<std::chrono::duration<double>>(
        std::chrono::high_resolution_clock::now() - t0).count();
    profiler.End();
    std::cout << "\nef_search = " << ef << ", Spend: " << profiler.ElapsedToString() << std::endl;
    std::cout << "time = " << std::fixed << std::setprecision(3) << t_val << " s" << std::endl;
    std::cout << "QPS = " << std::fixed << std::setprecision(3) << float(number_of_queries) / t_val
              << std::endl;
    output_recall_top_k(top_k, results, ground_truth_sets_1, ground_truth_sets_10, ground_truth_sets_100);
  }

  return 0;
}
rjzhb commented 3 weeks ago

And this is hnswlib code, you will find that it is almost same as usearch code:


#include "base_profiler.h"
#include "helper.h"
#include "hnswlib/hnswlib.h"
#include <fstream>
#include <iostream>
#include <cstdlib>

static const char *sift1M_base = "/benchmark_dataset/sift1M/sift_base.fvecs";
static const char *sift1M_query = "/benchmark_dataset/sift1M/sift_query.fvecs";
static const char *sift1M_groundtruth = "/benchmark_dataset/sift1M/sift_groundtruth.ivecs";

static const char *deep10M_base = "/benchmark_dataset/deep10M/deep10M_base.fvecs";
static const char *deep10M_query = "/benchmark_dataset/deep10M/deep10M_query.fvecs";
static const char *deep10M_groundtruth = "/benchmark_dataset/deep10M/deep10M_groundtruth.ivecs";

static const char *hnsw_index_l2_name_prefix = "/benchmark_save_index/hnswlib_index/";
static const char *hnsw_index_l2_name_suffix = "/hnswlib_index.save";

template<typename T1, typename T2, typename T3>
void output_recall_top_k(const unsigned top_k, std::vector<std::priority_queue<std::pair<T1, T2>>> &results_pq,
                         const std::vector<std::unordered_set<T3>> &groundtruth_1,
                         const std::vector<std::unordered_set<T3>> &groundtruth_10,
                         const std::vector<std::unordered_set<T3>> &groundtruth_100) {
    if (top_k != 100) {
        std::cerr << "top_k can only be 100" << std::endl;
        exit(-1);
    }
    size_t gt_cnt_1 = 0, gt_cnt_10 = 0, gt_cnt_100 = 0;
    std::vector<std::vector<T3>> results(results_pq.size());
    for (unsigned i = 0; i < results_pq.size(); i++) {
        while (!results_pq[i].empty()) {
            results[i].push_back(results_pq[i].top().second);
            results_pq[i].pop();
        }
        std::reverse(results[i].begin(), results[i].end());
    }
    for (unsigned i = 0; i < results.size(); i++) {
        for (size_t cnt = 0; cnt < 100; ++cnt) {
            auto id = results[i][cnt];
            if (cnt < 1 && groundtruth_1[i].contains(id)) {
                ++gt_cnt_1;
            }
            if (cnt < 10 && groundtruth_10[i].contains(id)) {
                ++gt_cnt_10;
            }
            if (cnt < 100 && groundtruth_100[i].contains(id)) {
                ++gt_cnt_100;
            }
        }
    }
    std::cout << "R@1:   " << std::fixed << std::setprecision(3) << float(gt_cnt_1) / float(results.size() * 1)
              << std::endl;
    std::cout << "R@10:  " << std::fixed << std::setprecision(3) << float(gt_cnt_10) / float(results.size() * 10)
              << std::endl;
    std::cout << "R@100: " << std::fixed << std::setprecision(3) << float(gt_cnt_100) / float(results.size() * 100)
              << std::endl;
}

template<typename T>
std::unique_ptr<T[]> load_data(const std::string &filename, size_t &num,
                               int &dim) {  // load data with sift10K pattern
    std::ifstream in(filename, std::ios::binary);
    if (!in.is_open()) {
        std::cout << "open file error" << std::endl;
        exit(-1);
    }
    in.read((char *) &dim, 4);
    in.seekg(0, std::ios::end);
    auto ss = in.tellg();
    num = ((size_t) ss) / (dim + 1) / 4;
    auto data = std::make_unique<T[]>(num * dim);

    in.seekg(0, std::ios::beg);
    for (size_t i = 0; i < num; i++) {
        in.seekg(4, std::ios::cur);
        in.read((char *) (data.get() + i * dim), dim * 4);
    }
    in.close();
    return data;
}

int main() {
    const char *homeDir = getenv("HOME");
    std::string base_file = homeDir;
    std::string query_file = homeDir;
    std::string groundtruth_file = homeDir;
    std::string data_type;
    size_t dimension = {};
    size_t base_counts = {};
    size_t M = 16;
    size_t ef_construction = 200;
    int ef_search_begin = 100;
    int ef_search_end = 800;
    int ef_search_step = 100;
    // let user choose to test sift1M or deep10M
    enum test_type {
        invalid = 0, sift1M = 1, deep10M = 2
    };
    std::cout << "please choose to test sift1M or deep10M, input 1 or 2" << std::endl;
    std::cout << "1. sift1M" << std::endl;
    std::cout << "2. deep10M" << std::endl;
    int choose = 0;
    std::cin >> choose;
    switch (choose) {
        case test_type::sift1M: {
            data_type = "sift1M";
            dimension = 128;
            base_counts = 1'000'000;
            base_file += sift1M_base;
            query_file += sift1M_query;
            groundtruth_file += sift1M_groundtruth;
            break;
        }
        case test_type::deep10M: {
            data_type = "deep10M";
            dimension = 96;
            base_counts = 10'000'000;
            base_file += deep10M_base;
            query_file += deep10M_query;
            groundtruth_file += deep10M_groundtruth;
            break;
        }
        default:
            std::cout << "invalid input, exit." << std::endl;
            return -1;
    }
    // let user choose M, ef_construction, ef_search
    std::cout
            << "please input M, ef_construction, ef_search_begin, ef_search_end, ef_search_step. input 0 to use default values"
            << std::endl;
    int M_input = 0, ef_construction_input = 0, ef_search_begin_input = 0, ef_search_end_input = 0, ef_search_step_input = 0;
    std::cin >> M_input >> ef_construction_input >> ef_search_begin_input >> ef_search_end_input
             >> ef_search_step_input;
    if (M_input > 0) {
        M = M_input;
    }
    if (ef_construction_input > 0) {
        ef_construction = ef_construction_input;
    }
    if (ef_search_begin_input > 0) {
        ef_search_begin = ef_search_begin_input;
    }
    if (ef_search_end_input > 0) {
        ef_search_end = ef_search_end_input;
    }
    if (ef_search_step_input > 0) {
        ef_search_step = ef_search_step_input;
    }
    std::string build_parameter = ".M." + std::to_string(M) + ".ef_construction." + std::to_string(ef_construction);
    std::string hnsw_index_l2_name = std::string(homeDir) + std::string(hnsw_index_l2_name_prefix) + data_type +
                                     std::string(hnsw_index_l2_name_suffix) + build_parameter;

    hnswlib::L2Space l2space(dimension);
    std::unique_ptr<hnswlib::HierarchicalNSW<float>> hnsw_index;

    std::ifstream f(hnsw_index_l2_name);
    if (f.good()) {
        // Found index file
        std::cout << "Found index file ... " << std::endl;
        hnsw_index = std::make_unique<hnswlib::HierarchicalNSW<float>>(&l2space, hnsw_index_l2_name);
    } else {
        BaseProfiler profiler;
        {
            size_t embedding_count;
            std::unique_ptr<float[]> input_embeddings;
            {
                int dim;
                BaseProfiler profiler_in;
                profiler_in.Begin();
                input_embeddings = load_data<float>(base_file, embedding_count, dim);
                profiler_in.End();
                assert(dimension == dim || !"embedding dimension wrong");
                assert(base_counts == embedding_count || !"embedding count wrong");
                std::cout << "Load base data: " << profiler_in.ElapsedToString() << std::endl;
                //output embedding_count, dimension
                std::cout << "embedding_count: " << embedding_count << std::endl;
                std::cout << "dimension: " << dim << std::endl;
            }
            hnsw_index = std::make_unique<hnswlib::HierarchicalNSW<float>>(&l2space, embedding_count, M,
                                                                           ef_construction);

            profiler.Begin();
            // insert data into index
            for (size_t idx = 0; idx < embedding_count; ++idx) {
                hnsw_index->addPoint(input_embeddings.get() + idx * dimension, idx);
                if (idx % 10000 == 0) {
                    std::cout << idx << ", " << get_current_rss() / (1024 * 1024) << " MB, "
                              << profiler.ElapsedToString()
                              << std::endl;
                }
            }
            profiler.End();
            std::cout << "before clearing input vector, memory cost: "
                      << get_current_rss() / (1024 * 1024) << " MiB" << std::endl;
        }
        std::cout << "Insert data cost: " << profiler.ElapsedToString() << " memory cost: "
                  << get_current_rss() / (1024 * 1024) << " MiB" << std::endl;
        hnsw_index->saveIndex(hnsw_index_l2_name);
    }

    size_t number_of_queries;
    std::unique_ptr<float[]> queries;
    {
        int dim;
        BaseProfiler profiler;
        profiler.Begin();
        queries = load_data<float>(query_file, number_of_queries, dim);
        profiler.End();
        assert(dimension == dim || !"query does not have same dimension as train set");
        std::cout << "Load query data: " << profiler.ElapsedToString() << std::endl;
    }

    int top_k;                                           // nb of results per query in the GT
    std::vector<std::unordered_set<int>> ground_truth_sets_1, ground_truth_sets_10, ground_truth_sets_100; // number_of_queries * top_k matrix of ground-truth nearest-neighbors
    {
        // load ground-truth and convert int to long
        size_t nq2;
        BaseProfiler profiler;
        profiler.Begin();

        auto gt_int = load_data<int>(groundtruth_file, nq2, top_k);
        assert(nq2 == number_of_queries || !"incorrect nb of ground truth entries");

        ground_truth_sets_1.resize(number_of_queries);
        ground_truth_sets_10.resize(number_of_queries);
        ground_truth_sets_100.resize(number_of_queries);
        for (size_t i = 0; i < number_of_queries; ++i) {
            for (int j = 0; j < top_k; ++j) {
                auto gt = gt_int[i * top_k + j];
                if (j < 1) {
                    ground_truth_sets_1[i].insert(gt);
                }
                if (j < 10) {
                    ground_truth_sets_10[i].insert(gt);
                }
                if (j < 100) {
                    ground_truth_sets_100[i].insert(gt);
                }
            }
        }
        profiler.End();
        std::cout << "Load ground truth: " << profiler.ElapsedToString() << std::endl;
    }
    for (int ef = ef_search_begin; ef <= ef_search_end; ef += ef_search_step) {
        hnsw_index->setEf(ef);
        BaseProfiler profiler;
        profiler.Begin();
        auto t0 = std::chrono::high_resolution_clock::now();
        std::vector<std::priority_queue<std::pair<float, size_t>>> results;
        results.reserve(number_of_queries);
        for (size_t id = 0; id < number_of_queries; ++id) {
            int corr = 0;
            auto result = hnsw_index->searchKnn(queries.get() + id * dimension, top_k);
            results.push_back(std::move(result));
        }
        auto t_val = std::chrono::duration_cast<std::chrono::duration<double>>(
                std::chrono::high_resolution_clock::now() - t0).count();
        profiler.End();
        std::cout << "\nef_search = " << ef << ", Spend: " << profiler.ElapsedToString() << std::endl;
        std::cout << "time = " << std::fixed << std::setprecision(3) << t_val << " s" << std::endl;
        std::cout << "QPS = " << std::fixed << std::setprecision(3) << float(number_of_queries) / t_val
                  << std::endl;
        output_recall_top_k(top_k, results, ground_truth_sets_1, ground_truth_sets_10, ground_truth_sets_100);
    }

    return 0;
}
rjzhb commented 3 weeks ago

So is there any suggestion?

ashvardanian commented 3 weeks ago

Thanks for sharing the code @rjzhb, it's very helpful!

Most of our users are at billion scale and beyond. Most of them don't use f32 and generally run 100+ threads, where concurrency and the cost of synchronization are higher priorities. Would be great if you have feedback or can contribute there!

So we don't actively optimize for the million-scale use-case and have a big ongoing release in prep - #402. We can come back to this later, once we done with the more urgent cases. Let's keep this open for now 🤗

rjzhb commented 3 weeks ago

Thanks for sharing the code @rjzhb, it's very helpful!感谢您分享代码,这很有帮助!

Most of our users are at billion scale and beyond. Most of them don't use f32 and generally run 100+ threads, where concurrency and the cost of synchronization are higher priorities. Would be great if you have feedback or can contribute there!我们的大多数用户规模都在十亿甚至更高。他们中的大多数不使用 f32 并且通常运行 100+ 线程,其中并发性和同步成本是更高的优先级。如果您有反馈或可以在那里做出贡献,那就太好了!

So we don't actively optimize for the million-scale use-case and have a big ongoing release in prep - #402. We can come back to this later, once we done with the more urgent cases. Let's keep this open for now 🤗因此,我们不会针对百万级用例进行积极优化,而是在准备阶段进行大规模的持续发布 #402 。一旦我们处理了更紧急的案件,我们可以稍后再谈这个问题。让我们暂时🤗保持开放

Thanks. It is helpful!

rjzhb commented 3 weeks ago

Thanks for sharing the code @rjzhb, it's very helpful!感谢您分享代码,这很有帮助!

Most of our users are at billion scale and beyond. Most of them don't use f32 and generally run 100+ threads, where concurrency and the cost of synchronization are higher priorities. Would be great if you have feedback or can contribute there!我们的大多数用户规模都在十亿甚至更高。他们中的大多数不使用 f32 并且通常运行 100+ 线程,其中并发性和同步成本是更高的优先级。如果您有反馈或可以在那里做出贡献,那就太好了!

So we don't actively optimize for the million-scale use-case and have a big ongoing release in prep - #402. We can come back to this later, once we done with the more urgent cases. Let's keep this open for now 🤗因此,我们不会针对百万级用例进行积极优化,而是在准备阶段进行大规模的持续发布 #402 。一旦我们处理了更紧急的案件,我们可以稍后再谈这个问题。让我们暂时🤗保持开放

I am planning to test it recently. Do you have any recommended parameter configurations? For example, how many threads do I need to open, using F32 or something else. Thank you so much.

ashvardanian commented 3 weeks ago

Generally, during indexing, people use all available threads. For large-scale indexing machines with 100+ cores are recommended. If that's a new CPU f16 should be a good option. If your embeddings are training in a quantization-aware scheme - i8 and b1 are even better 🤗

rjzhb commented 2 weeks ago

Generally, during indexing, people use all available threads. For large-scale indexing machines with 100+ cores are recommended. If that's a new CPU f16 should be a good option. If your embeddings are training in a quantization-aware scheme - i8 and b1 are even better 🤗

Generally, during indexing, people use all available threads. For large-scale indexing machines with 100+ cores are recommended. If that's a new CPU f16 should be a good option. If your embeddings are training in a quantization-aware scheme - i8 and b1 are even better 🤗

Thanks! But how could I use multiple threads? I mean, is there any special opening method that be stored in Usearch?

ashvardanian commented 2 weeks ago

Here are the docs: https://github.com/unum-cloud/usearch/blob/main/cpp/README.md

rjzhb commented 2 weeks ago

Here are the docs: https://github.com/unum-cloud/usearch/blob/main/cpp/README.md

I knew. But it seems like I haven't seen any examples of multi-threaded searches, only multi-threaded indexing

ashvardanian commented 2 weeks ago

Multi-threaded search looks almost identical. Just try and consider contributing extended docs once you succeed 🤗 You can also check tests and benchmarks for examples.

rjzhb commented 1 week ago

Multi-threaded search looks almost identical. Just try and consider contributing extended docs once you succeed 🤗 You can also check tests and benchmarks for examples.

Do you think it is ok?

pragma omp parallel for

for (size_t id = 0; id < number_of_queries; ++id) {
  auto result = usearch_index.search(queries.get() + id * dimension, top_k);
}
ashvardanian commented 1 week ago

Yes, that should work. Feel free to add to docs.