Question about recall performance

nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors

https://github.com/nmslib/hnswlib

Apache License 2.0

4.35k stars 644 forks source link

Question about recall performance #208

Open wuwenjunwwj opened 4 years ago

wuwenjunwwj commented 4 years ago

I build an index with hnswlib in inner product space（normalized data），the data dimension is 128，m= 60， ef_construction = 400。I use random vec（normalized） to test recall performance，I get the result below： topK recall 1 40% 100 70% 500 88% 5000 97% Is this reasonable？how can I improve top1、top100 recall acc in this situasion？

yurymalkov commented 4 years ago

Hi @wuwenjunwwj, Yes, this might be reasonable. When you increase the number of neighbors (K) you also increase the ef search parameter (ef>K). You can try setting, e.g. ef=1000 and test at K=100.