nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.35k stars 644 forks source link

Question about recall performance #208

Open wuwenjunwwj opened 4 years ago

wuwenjunwwj commented 4 years ago

I build an index with hnswlib in inner product space(normalized data),the data dimension is 128,m= 60, ef_construction = 400。I use random vec(normalized) to test recall performance,I get the result below: topK recall 1 40% 100 70% 500 88% 5000 97% Is this reasonable?how can I improve top1、top100 recall acc in this situasion?

yurymalkov commented 4 years ago

Hi @wuwenjunwwj, Yes, this might be reasonable. When you increase the number of neighbors (K) you also increase the ef search parameter (ef>K). You can try setting, e.g. ef=1000 and test at K=100.