nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.12k stars 609 forks source link

Looking for suggestions to optimize index building time #491

Closed jiangzhihui closed 11 months ago

jiangzhihui commented 11 months ago

Hi @yurymalkov, we are building a HNSW index in our production environment of ~10M vectors, but it takes around 1.5 hours for that. I'm wondering do we have evaluation for index construction time for different dataset size with different parameters? Not sure 1.5 hours for that is normal, or we have area to improve.

Btw, in our setting M is 32 and ef_constuction is 128.

Thanks!

yurymalkov commented 11 months ago

hi @jiangzhihui,

There are some times in the paper in Fig. 9 https://arxiv.org/pdf/1603.09320.pdf as rerference

Note that a lot depends on the hardware - number of threads and the type of cpu (to a lesser extent). Dimensionality both internal and external plays a big role as well. It might be too big, so compressing the vectors might speed up both construction and querying.

jiangzhihui commented 11 months ago

Thanks @yurymalkov for the reference and suggestions! It's quite helpful!