nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.41k stars 654 forks source link

Questions about the code in detail #176

Open chasingegg opened 4 years ago

chasingegg commented 4 years ago

I have read the code in detail, and I got some questions about some corner cases, I hope you guys could help me. Thanks!

yurymalkov commented 4 years ago

Hi @chasingegg ,

Thank you for looking trough the code. And sorry for long response (moving with family to another country). As far as I remember, +1 is supposed to handle the problem with prefetching inside the loop that you've described. It is definitely not essential.

Prefetching outside of the valid memory should not affect the outcome of the function. It can potentially hurt performance (https://stackoverflow.com/questions/40128151/what-happens-if-an-invalid-address-is-prefetched), but I didn't observed any

Methods searchBaseLayerST and searchBaseLayer are indeed should be the same (still plan to unite them in one templated method), so it is more a of a bug.

Actually, most (but not all) of the prefetches have minor effect on the performance which is not so easy to measure.

chasingegg commented 4 years ago

Hi, do you think it is feasible to do inner-one-query parallelism, because it should be a wonderful feature but seems none of the graph-based algorithms have done this?

yurymalkov commented 4 years ago

Hi @chasingegg

I think it is feasible, but comes at some QPS loss compared to data-parallelism. The main problem with inner-query parallelism is that most graph algorithms use list and queues associated with each query.

And if there are multiple threads accessing the collection structures, this leads to having locks every time a thread reads/writes and, thus, performance loss compared to data-parallelism. However, I never measured actual performance loss in practice.

Another way to do this is to have independent parallel searches for the same element and then unite the results. We did that in our work on NSW with a simpler search method (SISAP 2012) - it scaled ok with the number of cores, but the simpler search method was less efficient. It also was easy to implement because we used random enter points in the graph. For HNSW you can do the same by adding several versions of upper level graphs (with little overhead), or, probably, you can some stochasticity (e.g. like relaxed dropout in neural networks) to have different results for search trials for the same query.