Open gingerwizard opened 2 years ago
Useful https://towardsdatascience.com/using-approximate-nearest-neighbor-search-in-real-world-applications-a75c351445d - interesting in that post-filtering is not ideal e.g. by date or a field.
https://arxiv.org/abs/1610.02455 - sections 2-6 cover most algorithm options
I think inevitably we will mix algorithms - see https://www.pinecone.io/learn/composite-indexes/
We can split ANN algorithms into three distinct categories; trees, hashes, and graphs.
The following represent possible algorithmic approaches. For each approach there are typically variants. Note these focus on techniques that support low dimensionality (100ish max) I believe and this require encoding techniques for the text.
Sparse vector techniques (I don't know if we want to go down this route). It would be simpler to implement:
Diversified Proximity Graphs Navigating Spreading Out Graph (NSG) - https://arxiv.org/abs/1707.00143