Comparability of Tests Using different Index Types

zilliztech / VectorDBBench

A Benchmark Tool for VectorDB

MIT License

458 stars 108 forks source link

This isn't an issue so much as a question that is more meant for a discussion section.

I have been running a lot of testing on Milvus using VectorDBBench, specifically between HNSW and DiskANN indices. The default values for them are the following:

DiskANN:

search_list: 100

HNSW:

M: 30
efConstruction: 360
ef: 100

Can these default values be used to roughly compare index performance, where all other variables such as system hardware and Milvus configuration is the same?

With DiskANN, I imagine if we use a Search Performance Test with the Cohere 1M dataset, that it would essentially make a graph with all 1 million vectors in the vector space and then optimize the graph to prune for edges, and ultimately return the top 100 similar vectors based off of the cosine similarity.

With HNSW, since these parameters need to be more finely tuned, I'd imagine setting ef where the first initial layer starts with 100 and then each additional layer down would add 360 nodes is not really similar at all to DiskANN, and the number of edge connectivity being hard set with M is hard to control and set since I won't know for DiskANN how much interconnectivity there will be at the end. Plus working in a hierarchical manner changes things as far as how the search is performed in the search space.

I ultimately want to compare the two indices in the most apples to apples comparison I can, and am wondering if there are insights to this, or if ultimately they are different and not the easiest to compare with regards to QPS, latency, and recall.

@anrahman4 Although DiskAnn and HNSW are both graph-based algorithms, they have too many differences and their parameters are not exactly the same. The default parameters are just a set of parameters that we give based on experience. It's hard to compare directly.

Can these default values be used to roughly compare index performance, where all other variables such as system hardware and Milvus configuration is the same?

Yes.

With DiskANN, I imagine if we use a Search Performance Test with the Cohere 1M dataset, that it would essentially make a graph with all 1 million vectors in the vector space and then optimize the graph to prune for edges, and ultimately return the top 100 similar vectors based off of the cosine similarity.

DiskANN actually has a multitude of parameters, but only search_list can be adjusted in Milvus. search_list functions somewhat similarly to ef. Furthermore, DiskANN also has parameters akin to M that govern the density of the graph. Apologies for the limited explanation, it would be best if you refer to the original papers for a more comprehensive understanding of both algorithms.

With HNSW, since these parameters need to be more finely tuned ...

In terms of performance, DiskANN may not be as impressive asHNSW. However, its true worth lies in its ability to minimize memory usage by utilizing hard disks.

I ultimately want to compare the two indices in the most apples to apples comparison I can, and am wondering if there are insights to this, or if ultimately they are different and not the easiest to compare with regards to QPS, latency, and recall.

zilliztech / VectorDBBench

Comparability of Tests Using different Index Types #279