jean-pierreBoth / hnswlib-rs

Rust implementation of the HNSW algorithm (Malkov-Yashunin)
Other
157 stars 22 forks source link

hnsw-rs

This crate provides a Rust implementation of the paper by Yu.A. Malkov and D.A Yashunin:

"Efficient and Robust approximate nearest neighbours using Hierarchical Navigable Small World Graphs" (2016,2018) arxiv

Functionalities

The crate is built on top of the anndists and can use the following distances:

The hnsw implementation provides:

Implementation

The graph construction and searches are multithreaded with the parking_lot crate (See parallel_insert_data and parallel_search_neighbours functions and also examples files). Distances are provided by the crate anndists, see Building.

Building

Simd

Two features activate simd in the crate anndists :

Julia interface

By default the crate is a standalone project and builds a static libray and executable. To be used with the companion Julia package it is necessary to build a dynamic library. This can be done by just uncommenting (i.e get rid of the #) in file Cargo.toml the line:

#crate-type = ["cdylib"]

and rerun the command: cargo build --release.

This will generate a .so file in the target/release directory.

Algorithm and Input Parameters

The algorithm stores points in layers (at most 16), and a graph is constructed to enable a search from less densely populated levels to most densely populated levels by constructing links from less dense layers to the most dense layer (level 0).

Roughly the algorithm goes along runs as follows:

Upon insertion, the level l of a new point is sampled with an exponential law, limiting the number of levels to 16, so that level 0 is the most densely populated layer, upper layers being exponentially less populated as level increases.
The nearest neighbour of the point is searched in lookup tables from the upper level to the level just above its layer (l), so we should arrive near the new point at its level at a relatively low cost. Then the max_nb_connection nearest neighbours are searched in neighbours of neighbours table (with a reverse updating of tables) recursively from its layer l down to the most populated level 0.

The scale parameter of the exponential law depends on the maximum number of connection possible for a point (parameter max_nb_connection) to others.
Explicitly the scale parameter is chosen as : scale=1/ln(max_nb_connection).

The main parameters occuring in constructing the graph or in searching are:

Benchmarks and Examples (examples)

Some examples are taken from the ann-benchmarks site and recall rates and request/s are given in comments in the examples files for some input parameters. The annhdf5 module implements reading the standardized data files of the ann-benchmarks site, just download the necessary benchmark data files and modify path in sources accordingly. Then run: cargo build --release --features="simdeez_f" --examples .
It is possible in these examples to change from parallel searches to serial searches to check for speeds or modify parameters to see the impact on performance.

With a i9-13900HX 24 cores laptop we get the following results:

  1. fashion-mnist-784-euclidean : search requests run at 62000 req/s with a recall rate of 0.977
  2. ann-glove-25-angular : search for the first 100 neighbours run with recall 0.979 at 12000 req/s
  3. sift1m benchmark: (1 million points in 128 dimension) search requests for the 10 first neighbours runs at 15000 req/s with a recall rate of 0.9907 or at 8300 req/s with a recall rate of 0.9959, depending on the parameters.

Moreover a tiny crate bigann gives results on the first 10 Million points of the BIGANN benchmark and can used to play with parameters on this data. Results give a recall between 0.92 and 0.99 depending on number of requests and parameters.

Some lines extracted from this Mnist benchmark show how it works for f32 and L2 norm

    //  reading data
    let anndata = AnnBenchmarkData::new(fname).unwrap();
    let nb_elem = anndata.train_data.len();
    let max_nb_connection = 24;
    let nb_layer = 16.min((nb_elem as f32).ln().trunc() as usize);
    let ef_c = 400;
    // allocating network
    let mut hnsw =  Hnsw::<f32, DistL2>::new(max_nb_connection, nb_elem, nb_layer, ef_c, DistL2{});
    hnsw.set_extend_candidates(false);
    // parallel insertion of train data
    let data_for_par_insertion = anndata.train_data.iter().map( |x| (&x.0, x.1)).collect();
    hnsw.parallel_insert(&data_for_par_insertion);
    //
    hnsw.dump_layer_info();
    //  Now the bench with 10 neighbours
    let mut knn_neighbours_for_tests = Vec::<Vec<Neighbour>>::with_capacity(nb_elem);
    hnsw.set_searching_mode(true);
    let knbn = 10;
    let ef_c = max_nb_connection;
    // search 10 nearest neighbours for test data
    knn_neighbours_for_tests = hnsw.parallel_search(&anndata.test_data, knbn, ef_c);
    ....

Contributions

Sannsyn contributed to Drop implementation and FilterT trait. Petter Egesund added the DistLevenshtein distance.

Evolutions are described here

License

Licensed under either of

at your option.