jlmelville / rcpphnsw

Rcpp bindings for the approximate nearest neighbors library hnswlib
GNU General Public License v3.0
35 stars 10 forks source link

`random_seed` in the `hnsw_build` (and `hnsw_knn`) #23

Open BERENZ opened 2 months ago

BERENZ commented 2 months ago

The hnswlib includes a random_seed parameter, which is missing in the RcppHNSW.

init_index(max_elements, M = 16, ef_construction = 200, random_seed = 100, allow_replace_deleted = False)

Is it possible to add the parameter to the hnsw_build (and hnsw_knn) function to have full control over how the index is built?

jlmelville commented 2 months ago

@BERENZ the random seed is now exposed on the class interface. It's not going into the hnsw_build or hnsw_knn interface because you can call the set.seed function before calling those functions (and internally those functions now use the R RNG to seed the hnsw build step). Two things to note:

  1. If you use more than one thread there is no guarantee of reproducibility even if you set.seed.
  2. The hnswlib random seed is 64-bit, I only use the 32-bit integer space available in R. I doubt that matters overly much.

I am not planning a new CRAN submission soon so if this is still or relevance to you you will have to install from github.