nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.27k stars 629 forks source link

Question--time complexity for high LID dataset #390

Open jianshu93 opened 2 years ago

jianshu93 commented 2 years ago

Hello hnswlib Team,

I am wondering for high Local Intrinsic dimension dataset, complexity for database building is still O(N*log(N))? Another question I have is, from a HNSW graph structure, we can extract top k best neighbors for each database element very efficiently right (from the bottom layer)? and compare to k NN graph (O(dn^t), 1<t<2)?

Thanks,

Jianshu

yurymalkov commented 2 years ago

Hi @jianshu93,

  1. This is discussed in HNSW paper. The O(N*log(N)) is asymptotic scaling under (IMO) reasonable assumptions. That means that to observe this scaling on high LID datasets one might have to increase N to astronomical values to observe the expected scaling, similar to other algorithms with logarithmic scalability (e.g. kd-tree).
  2. Yeah. HNSW has an approximation of relative neighborhood graph which is very similar to the kNN graph (and better for routing), so extracting nearest neighbor should be a cheap operation there.