tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.54k stars 252 forks source link

vector search: int8 support #1664

Closed sivukhin closed 1 month ago

sivukhin commented 1 month ago

Context

We want to support more compressed vector representations (not only 1bit) - so this PR introduces support for INT8 quantization.

The quantization we implemented differs from usual quantization performed in the neural network. In our case - implementation is similar to what qdrant does in simple case without quantiles (see https://github.com/qdrant/quantization/blob/0caf67d96f022a792bda2e41fa878ba1e113113f/quantization/src/encoded_vectors_u8.rs#L34 or https://qdrant.tech/articles/scalar-quantization/).

The main difference is that quantized vector in neural-network case has two parameters: A: f32 and Z: u8 and the forward conversion looks like this: $X = A (X_q - Z)$ where $X_q \in [0..256) \cap \mathbb{Z}$.

Our (and qdrants) approach is slightly different: we store two f32 parameters: A: f32 and S: f32 with following forward conversion: $X = A X_q + S$.

The reason we choose this approach is that it's more generic and will represent skewed distributions not covering zero way better than first approach.

The layout for int8-quantized vectors are the following:

[data[0] as u8] [data[1] as u8] ... [data[dims - 1] as u8] [_ as u8; alignment_padding]* 
[alpha as f32] [shift as f32] [padding as u8] [trailing_bytes as u8] [4 as u8]

Changes