lmcinnes / pynndescent

A Python nearest neighbor descent for approximate nearest neighbors
BSD 2-Clause "Simplified" License
899 stars 105 forks source link

NNDescent.update_with_changed_data needs improvements #187

Closed Petkomat closed 2 years ago

Petkomat commented 2 years ago

For example, when all data are updated, the code does not work.

Next, the code for the method could probably be improved (possibly by removing init_kwargs, and making it more efficient).

Originally posted by @Petkomat in https://github.com/lmcinnes/pynndescent/issues/185#issuecomment-1138536612

Petkomat commented 2 years ago

Tried to merge previously existing update methods and wrote only one.

The tests for this mostly pass (21/24), however, we stay under the 95% accuracy threshold in the data case 3 (for all three metrics), when

Basicall, the code follows the previous update(X) method (as written here):

I would assume that the problematic test case is actually "easier" than some others (when, e.g., also fresh data appear). Am I doing something wrong?

lmcinnes commented 2 years ago

I'm not sure, I'll have to take a look. It could just be that the process is a little stochastic and noisy, and we happen to be hitting some bad cases. In that sense the answer may be to weaken the threshold. How far under the 95% are we?

Petkomat commented 2 years ago

It turns out, I did something wrong: ns, ds = self.neighbor_graph did not change self._neighbor_graph. Will do a PR, tests now pass.

Petkomat commented 2 years ago

Resolved with 7884b09ed4e7e720b032c89916b6096c86f53927