tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.54k stars 252 forks source link

diskann vector binary format #1551

Closed sivukhin closed 2 months ago

sivukhin commented 2 months ago

Context

First branch in the series for DiskANN implementation. This PR introduce few utility classes and accessors for DiskANN node block in binary format.

DiskANN node block v1 format has following structure:

[u64       nodeId] 
[u16       nEdges] 
[vector      node] 
[vector      edge]               * nEdges 
[vector   padding]               * (nMaxEdges - nEdges) 
[[u64 legacyField] [u64 edgeId]] * nEdges

Where nMaxEdges value determined from node block size and node and edge vector sizes. Note, that node vector size can be different from edge vector size (for example, due to edge vector compression)

legacyField were added by Pekka but new implementations of DiskANN don't use it. But I preserved it for compatibility reasons.

Changes

Testing