tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
11.24k stars 286 forks source link

vector search: neighbors compression (1bit quantization) #1650

Closed sivukhin closed 3 months ago

sivukhin commented 3 months ago

Context

This PR introduces support for compressed edges representation in DiskANN implementation. For now, only 1bit quantization is supported with the corresponding index parameter: libsql_vector_idx(e, 'compress_neighbors=1bit').

In order to support compressed representation DiskANN search procedure were changed and now it supports 2 list of nodes:

  1. aCandidates - list of unvisited candidates ordered by distance calculated with edge vector type (it will be compressed if index uses compression of edges)
  2. aTopCandidates - list of top visited candidates ordered by the exact distance calculated with node vector type

This was done in order to always compare distances produced by the same vector representation: aCandidates ordered by distance between vector of "edge type" (potentially compressed, but not always) while aTopCandidates ordered by distance between vector of "node type" (exact)

Changes

penberg commented 3 months ago

Nice! ๐Ÿ‘๐Ÿ‘๐Ÿ‘