Incomplete benchmarks - Githubissues

searchivarius commented 6 years ago

Hi guys,

I am trying to reproduce your results. I downloaded your data sets and the benchmark script: https://github.com/kakao/n2/blob/dev/benchmarks/benchmark_script.py

However, it doesn't have code to test NMSLIB.

Furthermore, on the first page you show query times for HNSW and your method. How did you get these (what is the test script)? There is another picture in your repo, which doesn't indicate there's any difference between N2 and NMSLIB: https://github.com/kakao/n2/blob/dev/docs/imgs/search_time/total.png

How does it correspond to the README picture which shows a 2x difference between two libraries?

Thanks!

corona10 commented 6 years ago

@searchivarius Thank you for your interest in our project. We are a big fan of your nmslib project.

We now update the benchmark code to reproduce our benchmark results. We expect you to run a youtube_reproduce.py on dev branch.
Accuracy between nmslib and n2 is almost similar, however search speed is different between two libraries. On the graph, the point on the top and the right means better performance.

corona10 commented 6 years ago

p.s

$ pip list | grep nmslib
nmslib (1.6.3)

searchivarius commented 6 years ago

@corona10 awesome, thanks! I will run it shortly.

corona10 commented 6 years ago

Closing this issue via https://github.com/kakao/n2/pull/14

searchivarius commented 6 years ago

This doesn't seem to be working, I am getting zero precision (although the other benchmark script works). I will send you a pull request for benchmark_script.py. The parameters for NMSLIB doesn't seem to be right. Although your version is a re-implementation of NMSLIB's HNSW, your variant somehow needs muuuch higher values of efSearch than NMSLIB to achieve the same precision. For NMSLIB, typically all reasonable values of efSearch are in the range 1-200.

corona10 commented 6 years ago

@searchivarius It was my mistake. I updated the code https://github.com/kakao/n2/blob/dev/benchmarks/youtube_reproduce.py#L375 should be angular..

searchivarius commented 6 years ago

No problem, I have merged the benchmarks. Annoy still fails sometimes on Youtube, but I hope you can fix this.

Regarding the differences in time, see my pull request. I see no difference in query time, but indexing is 20-30% faster.

Thanks!

corona10 commented 6 years ago

@searchivarius Thank you for your interest and the big contribution to our project.   Let’s talk about the benchmark.

indexing is 20-30% faster.

Thank you,

your variant somehow needs muuuch higher values of efSearch than NMSLIB to achieve the same precision. For NMSLIB, typically all reasonable values of efSearch are in the range 1-200. Correct, But Let’s talk about more deeply.

First, N2 need the higher value of efSearch to achieve the same precision of NMSLIB.   But this issue should start considering differently designed library. N2 needs more efSearch value than NMSLIB due to the different implementation with nmslib for hnsw, thus it does not makes sense using same efSearch value with the same accuracy for both libraries.

We need to tuning efSearch for each algorithm. This is why we designed the benchmark to measure search speed per precision not focusing on precision per efSearch.

 Ultimately, Most of the service tech company pursue high accuracy and high search speed for their service.

Low accuracy with fast search speed? No need to consider. So we are focusing on the high precision with fast search speed. According to your suggested metric

typically all reasonable values of efSearch are in the range 1-200.`

we got 0.62 precision. And for efSearch=1000, we got 0.88 precision but we are still hungry. So when we set ef_search value for 10000 and 100000 we got a satisfactory figure.

Yeah, you can say what if increase M value and decrease the efSearch value, As you know, increasing an M consumes more memory. (Low memory makes more space for service)

Although this result is only for benchmark metric, I think our benchmark on high precision shows N2 is faster than NMSLIB. In some respects, this difference is not important. But It is an important issue for us that the difference of a few milliseconds leads to customer complaints.

We are not saying N2 beats NMSLIB for all use-cases. But N2 solves this use-case which getting high precision and fast search time for large dataset I believe there may be use-cases which NMSLIB work better, there is no silver bullet.

Again, Thank you for your interesting of our project and really thank you for your pull request. We will review it soon :)

searchivarius commented 6 years ago

Did you have a chance to see the plots? I reattach them below. sift-5000000-2000-3.pdf youtube-5000000-2000-3.pdf glove-5000000-2000-3.pdf

ummae commented 6 years ago

@searchivarius Hi, would you let me know how did you get above three experiments results? from your pull request #20 codes?

searchivarius commented 6 years ago

@ummae yes

gony-noreply commented 3 years ago

@searchivarius Hi, I'm sorry that the feedback seems too late.

Recently, we have made many changes to the code of N2, and we have also worked to increase the reliability of the benchmark. The benchmark results for the recently released versions can be found at the link below.

With the 0.1.7 release, you can reproduce the benchmark result using the following notebook file(It takes 18 to 24 hours to run all cells).

https://github.com/kakao/n2/blob/0.1.7/benchmarks/benchmark.ipynb

We know that nmslib uses different query-parameters in benchmarks. However, in n2, fixed parameters are used for the convenience of development, and the result measurement through different query-parameters is to refer to the ann-benchmark result(In the recently updated ann-benchmark, n2 version 0.1.6 was used).

If you have any other questions or comments please comment. if not, we will close the issue and pr #20 later : )

kakao / n2

Incomplete benchmarks #11