nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.18k stars 614 forks source link

higher qps with similar recall on larger datasets #575

Open TyangJN opened 2 weeks ago

TyangJN commented 2 weeks ago

@yurymalkov Thanks for your great work.

I'd be really grateful if you could tell me:

i run hnsw on 1M sift dataset with M=32, efc=200, efs=256, which could reture a good qps and recall results.

but when it tures to 15M sift dataset, the qps decreased rapidly.

What parameters gave the best qps for larger datasets? and how can i optimize the parameters setting for larger datasets? Thanks a lot.

searchivarius commented 2 weeks ago

Hi @TyangJN does rapidly mean that going from say 14M to 15M you see a sudden big (e.g., 2-3x) drop?

How much memory and L3 cache do you have as well as the number of CPU cores? I assume you use them all for querying. One test to run is to count QPS for smaller number of threads, e.g., even for one. Do you see a sharp decrease in QPS when you use a single thread?