Open chasingegg opened 4 years ago
Hi @chasingegg ,
Thank you for looking trough the code. And sorry for long response (moving with family to another country). As far as I remember, +1
is supposed to handle the problem with prefetching inside the loop that you've described. It is definitely not essential.
Prefetching outside of the valid memory should not affect the outcome of the function. It can potentially hurt performance (https://stackoverflow.com/questions/40128151/what-happens-if-an-invalid-address-is-prefetched), but I didn't observed any
Methods searchBaseLayerST
and searchBaseLayer
are indeed should be the same (still plan to unite them in one templated method), so it is more a of a bug.
Actually, most (but not all) of the prefetches have minor effect on the performance which is not so easy to measure.
Hi, do you think it is feasible to do inner-one-query parallelism, because it should be a wonderful feature but seems none of the graph-based algorithms have done this?
Hi @chasingegg
I think it is feasible, but comes at some QPS loss compared to data-parallelism. The main problem with inner-query parallelism is that most graph algorithms use list and queues associated with each query.
And if there are multiple threads accessing the collection structures, this leads to having locks every time a thread reads/writes and, thus, performance loss compared to data-parallelism. However, I never measured actual performance loss in practice.
Another way to do this is to have independent parallel searches for the same element and then unite the results. We did that in our work on NSW with a simpler search method (SISAP 2012) - it scaled ok with the number of cores, but the simpler search method was less efficient. It also was easy to implement because we used random enter points in the graph. For HNSW you can do the same by adding several versions of upper level graphs (with little overhead), or, probably, you can some stochasticity (e.g. like relaxed dropout in neural networks) to have different results for search trials for the same query.
I have read the code in detail, and I got some questions about some corner cases, I hope you guys could help me. Thanks!
linkLists_[cur_c] = (char *) malloc(size_links_per_element_ * curlevel + 1);
why we need a '+1' here?size_links_per_element_
stands for the space allocated for each level, so size_links_perelement * curlevel is enough, no need for a '+1'?prefetch
command, I have several questions, when we use addPoint function to build index, it invokes searchBaseLayer function, at 198-199 line,when use searchKnn function to find knn, it invokes searchBaseLayerST function, at 276-277 line,
Actually, the first line of two methods is the same(to fetch the next neighbor's feature vector), but why the second line is different?
prefetch
in a for loop, would it cause the index-out-of-boundary problem at the last position of the for loop since it always tries to fetch the next region of the data array?