Open hicder opened 1 week ago
Sample code: https://github.com/gaoj0017/RaBitQ
Here are the notations:
Here is the summary of the indexing and querying algorithms:
Let's break the Index Phase down:
1 Normalize the set of vectors (Section 3.1.1)
This means find the centroid of all data vectors ($c$), then normalize all data vectors using: $o := \frac{o_r - c} {ā„ o_r - cā„}$
2 Sample a random orthogonal matrix š to construct the codebook Cšššš (Section 3.1.2)
Just need to create a random orthogonal matrix P. Here is the code from the paper's repo:
def Orthogonal(D):
G = np.random.randn(D, D).astype('float32')
Q, _ = np.linalg.qr(G)
return Q
https://github.com/gaoj0017/RaBitQ/blob/main/data/rabitq.py#L17-L21
3 Compute the quantization codes $\bar{x}_b$ (Section 3.1.3)
$\bar{x}_b$ is a D-bit vector that can be obtained from:
get_sign(dot(P_inverse, o))
where get_sign
returns 1 if the element is positive and 0 otherwise.
4 Pre-compute the values of $ā„o_r ā cā„$ and $\langle \bar{o}, o \rangle$
Compute these values and store them for later use. Note that we already have all the info from the previous steps.
$\bar{o} = P\bar{x}$ and $\bar{x}$ can be retrieved from $\bar{x}_b$ using $\bar{x} = (2\bar{x}_b - 1_D) / \sqrt{D}$ where $1_D$ is the D-dimensional vector with all entries equal 1.
Implement this paper: https://arxiv.org/abs/2405.12497 as a new quantization type