matsui528 / nanopq

Pure python implementation of product quantization for nearest neighbor search
MIT License
323 stars 43 forks source link

How to compute distance between PQ codes? #13

Closed waytehsu closed 3 years ago

waytehsu commented 3 years ago

Not sure if this should be a feature request.

Supposed I just want to approximate distance between two PQ codes (under the same encoder of course). What is the most efficient way to perform such operation?

matsui528 commented 3 years ago

The efficient way is to create a symmetric-distance table first and look it up when you want to approximate the distance between two PQ-codes.

Let's say you have 100-dim vectors with M=2 and Ks=256. After training the product quantizer, you'll obtain 2 (50dim 256) codewords.

Here, you can create symmetric-distance tables among codewords, D1 and D2. D1 is a 256x256 matrix for M=1. D1[0][1] is the L2 squared distance between the first and second codewords for M=1. D2[4][10] shows the distance between fifth and eleventh codewords for M=2.

Suppose you want to know the distance between the two PQ-codes, c1=[12, 8] and c2=[99, 3]. You can approximate it by D1[12][99] + D2[8][3]. This operation is fast; just looking up an array and summing.

Currently, I don't plan to implement this functionality because computing a symmetric-distance-table is an additional cost. If you want to know more, please refer to the followings