Current implementation can be suboptimal in some places due to usage of very trivial data structures (lists / arrays). That's how it was intended - we will improve performance in the subsequent PRs but make this one simpler.
Nevertheless, this PR tries to address most heavy operation in the algorithm - reading blobs from disk - and aims to make as little reads as possible (utilizing sqlite3_blob_reopen if possible).
From the performance perspective rough hierarchy of operation cost looks like this:
read/write diskann node block
distance calculation between two vectors (for example, openai vectors are 1536 dimensions long - so this is clearly very slow operation)
operations with non-compressed vector payload
all other operations
Changes
Implement DiskANN algorithm
Adjust BlobSpot code for cases when previous call to open/reopen failed (and blob became useless as all subsequent operations will return SQLITE_ABORT error; read sqlite3_blob_* docstrings for more details)
Add basic test in the test_libsql_diskann.c (it's a bit hacky for now - but it's fine since we soon will get merge full integration with SQLite and will no longer need these tests anymore, actually)
Context
Third branch in the series for DiskANN implementation. This PR introduce DiskANN algorithm itself :tada:
The algorithm core based on the paper LM-DiskANN: Low Memory Footprint in Disk-Native Dynamic Graph-Based ANN Indexing
Current implementation can be suboptimal in some places due to usage of very trivial data structures (lists / arrays). That's how it was intended - we will improve performance in the subsequent PRs but make this one simpler.
Nevertheless, this PR tries to address most heavy operation in the algorithm - reading blobs from disk - and aims to make as little reads as possible (utilizing
sqlite3_blob_reopen
if possible).From the performance perspective rough hierarchy of operation cost looks like this:
Changes
BlobSpot
code for cases when previous call to open/reopen failed (and blob became useless as all subsequent operations will returnSQLITE_ABORT
error; readsqlite3_blob_*
docstrings for more details)test_libsql_diskann.c
(it's a bit hacky for now - but it's fine since we soon will get merge full integration with SQLite and will no longer need these tests anymore, actually)