Algorithm Analysis - Scalable Nearest Neighbors (ScaNN)

Summary

Scalable Nearest Neighbors (ScaNN) is presented. The algorithm behind ScaNN, Anisotropic Vector Quantization, is a compression technique that enables fast approximate distance computations between vector embeddings. ScaNN achieves state-of-the-art performance on the glove-100-angular dataset on ann-benchmarks.com, outperforming other similarity search libraries by a factor of two, as shown below:

Resources

Google blogpost: https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html
ScaNN quantization paper: https://arxiv.org/pdf/1908.10396.pdf
Product Quantization for Nearest Neighbor Search: https://hal.inria.fr/inria-00514462v2/document
MIPS: https://proceedings.neurips.cc/paper/2014/file/310ce61c90f3a46e340ee8257bc70e93-Paper.pdf

Overview

As a first step, we must understand what vector embedding is. It is essentially a lower-dimensional representation of the vectors such that vectors that are similar in the input space result in similar points in the embedding space. To answer a query, the system first maps the query vector to the embedding space. Then it finds, among all the embeddings, the ones closest to the query - the nearest neighbor search problem. A common way to measure query-database embedding similarity is by their inner product; this type of nearest neighbor search is known as maximum inner-product search (MIPS). ScaNN proposes a new quantization approach for MIPS.

Key Idea

Previous vector quantization schemes quantize database elements by minimising the distance between the vector and its quantized form. This is a useful metric, but is not equivalent to optimizing for nearest neighbor search. As ScaNN proposes, it is actually encodings with greater distance from the vector that may actually result in greater MIPS accuracy.

Consider the figure below. In the left subplot, x1 is quantized to c2 and x2 to c1, because these are their closest embeddings. However, after taking the inner product, note how the relative ranking is wrong: <q, c2> is greater than <q, c1>, even though <q, x1> is less than <q, x2>.

Now consider the embedding on the right, where x1 -> c1 and x2 -> c2. The relative ranking according to MIPS is preserved. This is because direction matters as well as magnitude.

c2 is closer to x1, but its offset is almost parallel to x1, whereas c1 is further away from it, but its offset is almost orthogonal to x1 (situation is flipped for x2). The error in the parallel direction is more harmful in MIPS because it disproportionately impacts higher inner products, which is what MIPS is trying to estimate accurately by definition. Therefore, we would like to penalize parallel errors more than orthogonal ones, as shown below:

This approach is called Anisotropic Vector Quantization becuase of the directional dependence of the loss function. The loss function itself named score-aware loss function, and the relative weights of the two error components (parallel & orthogonal) is determined by a hyperparameter, w.

Assessment

Using the criteria here:

Are the results approximate or exact? If approximate, how accurate vs exhaustive search? is it tuneable? If we want to tune do we need to reindex?
Do we need to perform special operations on the vectors e.g. compression, for the algorithm to work?
Does the algorithm support high dimensionality and high document counts? Could we use sparse vectors or would we need to encode to lower dimensionality?
Does the algorithm deliver low latency queries and high throughput on large datasets? How does dimensionality impact this? Can it be tuned?
How does the algorithm perform for indexing performance - both throughput and final size? How does dimensionality impact this?
Can datapoints be added, updated and removed? Or does the corpus need to be indexed in one go?
Can queries be executed concurrently? Can querying occur whilst update operations are occurring? Do we need to support this?
Memory requirements - do the indexed vectors need to fit entirely in RAM or are disk space structures e.g. with memory mapping, supported?
Can indexes be easily serialized to disk to save between restarts?
How does the algorithm deal with varying vector dimensions e.g. query might N dimensions and text documents various..or do all vectors have to be the same size? If so, how do we handle?
Do the algorithms rely on special hardware e.g. GPUs or is commodity hardware fine (I think we prefer)
Does the algorithm use a custom distance metric or just euclidean distance?
General suitability for text retrieval.
Any good example implementations?

saadsharif / ttds-group