code issue - Githubissues

yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data

Apache License 2.0

1.26k stars 115 forks source link

code issue #84

Closed op-hunter closed 4 years ago

op-hunter commented 4 years ago

the 569th line in function Neighborhood::search of file Graph.cpp:

if (distance <= sc.radius) {
        results.push(result);
        if (results.size() >= sc.size) {
          if (results.top().distance >= distance) {
        if (results.size() > sc.size) {
          results.pop();
        }
        sc.radius = results.top().distance;
        explorationRadius = sc.explorationCoefficient * sc.radius;
          }
        }
      }

sc.radius is updated by the top().distance of heap results in each loop, so my question is: what's the considerations of so much ifs in the code snippet above? how about:

if (distance < results.top().distance) {
     results.push(result);
     if (results.size > sc.size)
        results.pop();
     sc.radius = results.top().distance;
     explorationRadius = sc.explorationCoefficient * sc.radius;
}

masajiro commented 4 years ago

Both are not the same. A lot of ifs can reduce the processing time.

op-hunter commented 4 years ago

In most case, when the program execute to the third if, there is sc.radius == results.top().distance, right? Since the first if pass, the third if is right obviously. Only at the begining of the search process, when results.size() < sc.size, the third if works. So why "A lot of ifs can reduce the processing time."?

masajiro commented 4 years ago

In this part, it is not assumed that sc.radius is always the same as results.top().distance. Since, in most cases, the results.size is always more than sc.size, the third if is meaningless. However, this part is implemented for less results.size than sc.size for experimental conditions. I would like to avoid to execute costly results.pop() by if.

op-hunter commented 4 years ago

You mean this search function is just experimental implementation for special case? So the right implementation is searchReadOnlyGraph? But I had traced from Command.cpp::search(Args &args) and finally step into the search function, so where is the problem?

op-hunter commented 4 years ago

So the problem is what's the difference between repository and searchRepository in class NeighborhoodGraph?

masajiro commented 4 years ago

Since objects are also searched to build graphs, NeighborhoodGraph::search is used for building a graph as well. On the other hand, NeighborhoodGraph::searchReadOnlyGraph to shorten the search time is not used for building a graph but just for searching for a graph with the read only option. NeighborhoodGraph::searchReadOnlyGraph explores the read only graph that is converted from the stored graph when the index is opened.