How to compute distance between PQ codes?

The efficient way is to create a symmetric-distance table first and look it up when you want to approximate the distance between two PQ-codes.

Let's say you have 100-dim vectors with M=2 and Ks=256. After training the product quantizer, you'll obtain 2 (50dim 256) codewords.

Here, you can create symmetric-distance tables among codewords, D1 and D2. D1 is a 256x256 matrix for M=1. D1[0][1] is the L2 squared distance between the first and second codewords for M=1. D2[4][10] shows the distance between fifth and eleventh codewords for M=2.

Suppose you want to know the distance between the two PQ-codes, c1=[12, 8] and c2=[99, 3]. You can approximate it by D1[12][99] + D2[8][3]. This operation is fast; just looking up an array and summing.

Currently, I don't plan to implement this functionality because computing a symmetric-distance-table is an additional cost. If you want to know more, please refer to the followings

the implementation in faiss https://github.com/facebookresearch/faiss/blob/master/faiss/impl/ProductQuantizer.h#L161
Sec 3.2 of PQk-means paper: https://dl.acm.org/ft_gateway.cfm?id=3123430

matsui528 / nanopq

How to compute distance between PQ codes? #13