yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Apache License 2.0
1.24k stars 114 forks source link

Range Search #32

Closed tanliboy closed 4 years ago

tanliboy commented 5 years ago

I tried NGT in some data, and I am very impressed by its recall and speed. I noticed that most of the examples are for topK, and I am wondering this algorithm is fit for range_search – search for all vectors within a specific radius of a given vector.

In the python wrapper, the radius setting is not exposed https://github.com/yahoojapan/NGT/blob/master/python/src/ngtpy.cpp#L126. However, according to the C++ example https://github.com/yahoojapan/NGT/blob/master/bin/search/search.cpp#L54, we can set the radius along with size.

Is this algorithm fit for range_search (by make size=data_size and control radius)? If so, do we have data compared with other approaches (classic inverted files, vp-tree, or other approaches)? Thanks!

masajiro commented 5 years ago

NGT can basically deal with range search excluding the python bindings. When you set a radius, the radius is only set to the initial radius instead of infinity for the KNN search. Although I have not exactly compared NGT with other methods by using range search, I think inverted-file-based methods are advantageous for wide range search compared to graph-based methods.

masajiro commented 4 years ago

I have implemented set for ngtpy to set a radius for range search.