Closed ShikharJ closed 3 years ago
@masajiro Thanks for the insight! I'm able to see good latency numbers now. Is there a way of printing the recall value for the whole query set directly from the search
command?
The ngt has the eval command that computes the total recall for all of the specified queries. First, make a ground truth. It takes a little long.
ngt search -i s -n 10 -e 0.1 -o e onng-index query.tsv > gt
Make a target result.
ngt search -n 10 -e 0.1 -o e onng-index query.tsv > result
Evaluate the result with the ground truth.
> ngt eval gt result
# # of evaluated resultant objects per query=10
# Factor (Epsilon) # of Queries Precision Time(msec) # of computations # of visted nodes
0.1 500 0.9982 0.513087 0 0
You can also specify the range of epsilon.
ngt search -n 10 -e 0:0.1:0.01 -o e onng-index query.tsv > result
> ngt eval gt result
# # of evaluated resultant objects per query=10
# Factor (Epsilon) # of Queries Precision Time(msec) # of computations # of visted nodes
0 500 0.6994 0.0724443 0 0
0.01 500 0.7464 0.0431784 0 0
0.02 500 0.7964 0.0527388 0 0
0.03 500 0.8466 0.0605951 0 0
0.04 500 0.8938 0.0775902 0 0
0.05 500 0.9352 0.103419 0 0
0.06 500 0.9696 0.151118 0 0
0.07 500 0.9854 0.219897 0 0
0.08 500 0.9926 0.303011 0 0
0.09 500 0.9958 0.395865 0 0
0.1 500 0.9982 0.513087 0 0
@masajiro Thanks for the help, I really appreciate it. I can see that the plots mentioned in the NGT
repo depict ONNG to be the SOTA when it comes to a variety of datasets (even in my experiments on SIFT1M
, I can see ONNG
performing a lot better than other algorithms like HNSW
), while the ann-benchmarks
plots seem to depict an entirely different story. In a number of plots, ONNG
doesn't even seem to be benchmarked by ann-benchmarks
. Any idea why is that the case?
Thank you for asking about the matter that I am concerned. ONNG can build the indexes that bring almost the best performance by using the setting in ann-benchmarks. On the other hand, it takes long time to build them compared to others. I think that as for the current results of ann-benchmarks, the default timeout 2 hours was used, then some executions of NGT probably were not able to finish. In addition, since ann-benchmarks uses just one core even for building indexes, it makes the build times longer.
@masajiro Thanks for the insight, once again. Please feel free to close this issue as you see fit.
@masajiro I have another question if you don't mind. Does the regular ONNG graphs make use of AVX-512, whenever applicable?
If you build NGT on the computer with avx512, NGT uses avx512 for some distances. However, the search times with avx512 were almost the same as those with avx2 from my experiments.
Hey everyone,
I was trying to replicate some of the results from the ann-benchmarks on my system, and gave
NGT-ONNG
a test run on theSIFT1M
dataset using the following commands, which I deduced from here and here:Everything works so far, so I doubt there is an installation issue with my system. However, the output latency is still pretty low compared to what I was expecting. Here's a sample output run:
This only results in a throughput of
10000/292.421 = 34
queries per second, which is far from the10^4
numbers being observed onSIFT1M
. Can anyone help me in speeding up the search process please?