tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.54k stars 252 forks source link

integrate diskann to the sqlite code #1571

Closed sivukhin closed 2 months ago

sivukhin commented 2 months ago

Context

Fourth and final branch in the initial series for DiskANN implementation which glue all things together and integrate DiskANN into the libSQL completely 🎉🎉🎉

This PR hooks into the SQLite to make DiskANN indices work:

  1. idxType enumeration extended with new SQLITE_IDXTYPE_VECTOR = 4 (note the if mask 0b11 = 3 will be applied to idxType then vector index type "became" SQLITE_IDXTYPE_APPDEF)
  2. OP_OpenVectorIdx VM op-code were added - although we maybe can change OP_OpenWrite op-code behaviour - new op-code approach is more safe and give us more freedom and safety. For example, P5 register serve it's own need for new op code and if OPFLAG_FORDELETE is set then we will perform truncate of index before allocating new cursor (this is needed for REINDEX command)
  3. OP_IdxInsert and OP_IdxDelete adjusted to handle special vector index type
  4. Disable xferOptimization for vector indices (SQLite optimization which copy internal table/indices structures without explicit reindexing for queries like INSERT INTO t SELECT * FROM q)
  5. Add libsql_vector_idx no-op function which used as a marker in the vector index creation syntax CREATE INDEX t_idx ON t (libsql_vector_idx(e))
  6. Implement vector_top_k(idx, vector, k) virtual table

Testing