Open MrHwc opened 5 years ago
For index construction time, it really depends on how you set the parameters (and how strong your CPU is). For me, I used all default parameters, and it took me 33893.343 s to build the index (only 1 thread, using Intel Core i5-8500). If I used more thread (i.e. 4 or more), this time would be reduced significantly. So I couldn't tell whether your index construction time is fast or slow (and I haven't try to use NSG yet).
For query time, as I used default "MaxCheck" value (8192), the average time for each 100 neighbors search was 0.003575 s with 0.91352 recall. And if I set "MaxCheck" to 16384, the average time was 0.006024 s for a recall of 0.94465. I wonder if your time included other things (i.e. loading index, which takes about 6s on my pc).
I haven't tried the add and delete function yet, but would consider to do that soon. I would also suggest you to check this website: http://ann-benchmarks.com/ (maybe you have already done so). All experiments were run in Docker containers on Amazon EC2 c5.4xlarge instances that are equipped with Intel Xeon Platinum 8124M CPU (16 cores available, 3.00 GHz, 25.0MB Cache) and 32GB of RAM.
My environment:
Python 2.7 Windows 10
I used the official default parameters,It takes about 6 s to load the index, and the time to query 1000 vectors is about 28 s. Why is my time longer than you? my machine: Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz Ubuntu 18.04 python3
For my GIST index, I would need to set an extreme parameter ("MaxCheck" to 500000, which would not make any sense in real world situation since the size of the data-set is only 1000000) to achieve a time similar to yours (with a recall of 0.95308). I have also tried to build the index again with a different configuration (fix a small bug in previous index), the search time would be longer (with a much higher recall), but still much lower than your time. I wonder if you were running other programs (that used up your CPU) when doing the search. Other than that, I currently may not be able to answer your question.
Use the same index to reach the same recall(0.9), Server query time is 23.14s,my PC query time is 13.57s,Why server test results are slower than computers? server machine: Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz Ubuntu 18.04
my PC: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz 3.40GHz Ubuntu 18.04
I ran some other algorithms and performance tests using the ann-benchmark tool. (There was no time to run on the various datasets.) The y-axis is logarithmic scale, so the performance gap is actually greater. Given the index size, speed, and performance of the algorithm, I think the results are not reasonable. Please suggest comments or additional tests about my results.
Dataset : SIFT-128 1M Experiment :
I set optimization options for the build script and rerun the tests. The index size is a bit large but I think it is reasonable performance.
Unrelated to this issue: Why did you choose RNG as your data structure?
I tested it with a GIST data set,The size of the query vector is (1000*960),Return 100 neighbors,The time I built the index is 20562.084 s,The query time is 35.082 s,recall is 0.92669,The time to add 100 vectors is 22.281 s. I feel this time is very long, am I using it correctly? Its index construction and query time are longer than NSG.