ef_search is a parameter consumed at query time which determines how many edges of the HNSW graph are traversed to find approximate nearest neighbors. Setting this hyperparameter allows HNSW to trade recall for speed at query time.
The way this was implemented by HNSWlib had some bad design decisions. It was a property of each index, changed by calling set_ef on the index; besides being cumbersome, it also creates a data race in concurrent execution where multiple threads want to execute queries with different ef_search.
Additionally, it was not written out with the index, and when the index was loaded again, it was set to 10, creating an unncessary footgun.
This PR fixes all of the above. It:
changes the index parameter name to ef_search_default_ to make it clear what this is for, and renames the function signatures setting it accordingly.
adds a shared mutex which is locked when writing, but allows parallel reading.
adds an argument to the query path allowing each query to set it independently - this is passed by value so it's on the stack of the function call, rather than on the heap. if it's not passed, we read the index default
updates all tests and examples
Making this work required us to go up to C++ 17 and change the mac OS target to be 10.12 or later. These are both ancient, and we compile this for our users anyway.
Unfortunately, the formatter was really aggressive so there's a lot of changes which are not code changes. I will annotate the actual changes with comments.
ef_search
is a parameter consumed at query time which determines how many edges of the HNSW graph are traversed to find approximate nearest neighbors. Setting this hyperparameter allows HNSW to trade recall for speed at query time.The way this was implemented by HNSWlib had some bad design decisions. It was a property of each index, changed by calling
set_ef
on the index; besides being cumbersome, it also creates a data race in concurrent execution where multiple threads want to execute queries with different ef_search.Additionally, it was not written out with the index, and when the index was loaded again, it was set to
10
, creating an unncessary footgun.This PR fixes all of the above. It:
changes the index parameter name to
ef_search_default_
to make it clear what this is for, and renames the function signatures setting it accordingly.adds a shared mutex which is locked when writing, but allows parallel reading.
adds an argument to the query path allowing each query to set it independently - this is passed by value so it's on the stack of the function call, rather than on the heap. if it's not passed, we read the index default
updates all tests and examples
Making this work required us to go up to C++ 17 and change the mac OS target to be 10.12 or later. These are both ancient, and we compile this for our users anyway.
Unfortunately, the formatter was really aggressive so there's a lot of changes which are not code changes. I will annotate the actual changes with comments.