Closed nicolamontecchio closed 8 years ago
@nicolamontecchio really cool stuff! Please see comments ;) Did you do any perf (speed) testing?
Btw what are you using an Euclidean index for?
i'll take a look at these tomorrow during a very long (thanks amtrak) train ride to boston
See responses inline and code changes. I run IntelliJ's code formatter, should be properly formatted now
on master: Avg. time per query: 0.076ms
this branch: Avg. time per query: 0.079ms
The dataset tested is 1M x 8, query times are averaged over 1M queries
Aha, so it is slower! :)
I was mainly concerned about the amount of garbage generated during searching, but maybe it's NBD since it's all virtually stack-allocated garbage. So I guess that's fine.
@rohanag are you requesting a lot more than 10 neighbors because quality is better? you should really consider adding support for search_k
in the java version
@erikbern yup cos of quality. yeah will consider adding it
actually probably should be less of a speedup in Java – most of the speedup in the Python/C++ version is that you avoid creating big expensive Python objects. Some of the speedup is caused by using a partial sort as well – instead of sorting 10000 elements, turns out STL has this nice std::partial_sort function: https://github.com/spotify/annoy/blob/master/src/annoylib.h#L574
tested w/ 1k NN instead of 10; speed is again pretty much the same from master to this branch
@nicolamontecchio pushed to maven central, be aware that it takes a while propage though.
:100:
:neckbeard:
Changes:
Note that the results do not exactly match between c++ and java versions, though they do largely overlap (at least 5 results out of 10 in common). I'll keep digging to figure out exactly what's different. However in the Angular index case, the code in this PR gives exactly the same results as before (i.e. results between java and c++ were already slightly different).