lmcinnes / pynndescent

A Python nearest neighbor descent for approximate nearest neighbors
BSD 2-Clause "Simplified" License
891 stars 105 forks source link

Improvement direction from ScaNN #108

Open hypnopump opened 3 years ago

hypnopump commented 3 years ago

Hi there! Cool library, very useful!

An improvement to Nearest Neighbour Search just came out adn was published at NeurIPS: https://arxiv.org/abs/1908.10396 based on vector quantization. I wonder if there are any plans to incorporate the techniques discussed there.

The basic idea seems to incorporate not only distance but angle information into the quantization. I think many python libraries could benefit from the improvements they claim, but i don't know if any straighforward implementation of such methods could be applied here.

hypnopump commented 3 years ago

Well, while checking the ann-benchmarks it seems their improvement is not quite constant across datasets, so feel free to close the issue

lmcinnes commented 3 years ago

Thanks for the suggestion. I looked through the ScaNN paper when the preprint first came out, and also at the software they released. It certainly provides very impressive performance (at least for angular distance benchmarks), but from my limited experiments it seems to be tightly tied to an impressive implementation. The ideas are interesting, but I could not generate any notable performance attempting to implement them in pynndescent -- in part due to managing to convince numba to do the right things. At the same time the technique seems largely fairly tightly tied to cosine / dot produce distance measures. While there are a lot of datasets (particularly NLP based ones) for which that is ideal (and hence the existence of ScaNN), I am endeavouring to keep pynndescent as general purpose as possible at this time, so those approaches are not necessarily a great fit.