nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.11k stars 607 forks source link

[ENH] Dynamic ef_search #572

Closed atroyn closed 3 days ago

atroyn commented 3 days ago

ef_search is a parameter consumed at query time which determines how many edges of the HNSW graph are traversed to find approximate nearest neighbors. Setting this hyperparameter allows HNSW to trade recall for speed at query time.

The way this was implemented by HNSWlib had some bad design decisions. It was a property of each index, changed by calling set_ef on the index; besides being cumbersome, it also creates a data race in concurrent execution where multiple threads want to execute queries with different ef_search.

Additionally, it was not written out with the index, and when the index was loaded again, it was set to 10, creating an unncessary footgun.

This PR fixes all of the above. It:

Making this work required us to go up to C++ 17 and change the mac OS target to be 10.12 or later. These are both ancient, and we compile this for our users anyway.

Unfortunately, the formatter was really aggressive so there's a lot of changes which are not code changes. I will annotate the actual changes with comments.

atroyn commented 3 days ago

Ignore this, we'll contribute it upstream later.