tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.54k stars 252 forks source link

vector search: fix idx type bug #1659

Closed sivukhin closed 1 month ago

sivukhin commented 1 month ago

Context

Recently we introduced new idxType = 4 in the native SQLite code which corresponds to the vector index.

Internally, SQLite use this field in several places and we tried to make SQLite believe that idxType = 4 is equivalent to idxType = 0 (which is APPDEF user defined-index). We missed one spot which results in the out-of-bound read and SEGFAULT in some cases (for some reason, SEGFAULT reproduced stably on Mac but code works fine on Linux with this bug :thinking: ).

This PR fixes SEGFAULT for PRAGMA index_list='...' statements and also change our approach: instead of introducing new idxType we add new field idxIsVector which we will use to distinguish ordinary indices from the vector one but sqlite code will see our vector indices as APPDEF indices. This is more robust change because it will work for future SQLite changes more smoothly and will require less patches from our side.

Failing piece of code in the PRAGMA implementation: https://github.com/tursodatabase/libsql/blob/8262d238f5fad9b6ecbc1e27a82b781d602e63b8/libsql-sqlite3/src/pragma.c#L1390-L1398

Originally, this issue were found while testing vector feature with cr-sqlite extension which uses PRAGMA index_list='...' under the hood to analyze table/index structure. Example of stacktrace for SELECT crsql_as_crr('...') call resulting in the SEGFAULT:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000018de08904 libsystem_platform.dylib`_platform_strlen + 4
    frame #1: 0x00000001000974d8 sqlite3`vdbeChangeP4Full [inlined] sqlite3Strlen30(z="") at sqlite3.c:35307:28 [opt]
    frame #2: 0x00000001000974cc sqlite3`vdbeChangeP4Full(p=<unavailable>, pOp=0x00000001500116a0, zP4=<unavailable>, n=<unavailable>) at sqlite3.c:87124:20 [opt]
    frame #3: 0x0000000100097910 sqlite3`sqlite3VdbeAddOp4 [inlined] sqlite3VdbeChangeP4(p=0x0000000150013b80, addr=<unavailable>, zP4="", n=0) at sqlite3.c:87147:5 [opt]
    frame #4: 0x00000001000978b0 sqlite3`sqlite3VdbeAddOp4(p=0x0000000150013b80, op=<unavailable>, p1=<unavailable>, p2=<unavailable>, p3=<unavailable>, zP4="", p4type=0) at sqlite3.c:86007:3 [opt]
    frame #5: 0x00000001000fdb64 sqlite3`sqlite3VdbeMultiLoad(p=0x0000000150013b80, iDest=1, zTypes="isisi") at sqlite3.c:85982:7 [opt]
    frame #6: 0x00000001000b19f4 sqlite3`sqlite3Pragma(pParse=0x000000016fdfd900, pId1=<unavailable>, pId2=0x000000016fdfcf48, pValue=<unavailable>, minusFlag=<unavailable>) at sqlite3.c:141339:9 [opt]
    frame #7: 0x000000010009be8c sqlite3`yy_reduce(yypParser=0x000000016fdfcee8, yyruleno=259, yyLookahead=<unavailable>, yyLookaheadToken=<unavailable>, pParse=0x000000016fdfd900) at sqlite3.c:0 [opt]
    frame #8: 0x00000001000442bc sqlite3`sqlite3RunParser at sqlite3.c:178594:15 [opt]
    frame #9: 0x000000010004429c sqlite3`sqlite3RunParser(pParse=<unavailable>, zSql="") at sqlite3.c:179901:5 [opt]
    frame #10: 0x000000010009387c sqlite3`sqlite3Prepare(db=<unavailable>, zSql=<unavailable>, nBytes=<unavailable>, prepFlags=<unavailable>, pReprepare=<unavailable>, ppStmt=0x0000600000594128, pzTail=<unavailable>) at sqlite3.c:143681:5 [opt]
    frame #11: 0x0000000100042ebc sqlite3`sqlite3LockAndPrepare(db=0x000000014c604740, zSql="PRAGMA index_list='t'", nBytes=-1, prepFlags=128, pOld=0x0000000000000000, ppStmt=0x0000600000594128, pzTail=0x0000000000000000) at sqlite3.c:143756:10 [opt]
    frame #12: 0x00000001000984a0 sqlite3`pragmaVtabFilter [inlined] sqlite3_prepare_v2(db=<unavailable>, zSql="PRAGMA index_list='t'", nBytes=-1, ppStmt=<unavailable>, pzTail=0x0000000000000000) at sqlite3.c:143843:8 [opt]
    frame #13: 0x0000000100098488 sqlite3`pragmaVtabFilter(pVtabCursor=0x0000600000594120, idxNum=<unavailable>, idxStr=<unavailable>, argc=<unavailable>, argv=<unavailable>) at sqlite3.c:142802:8 [opt]
    frame #14: 0x000000010007ccb8 sqlite3`sqlite3VdbeExec(p=<unavailable>) at sqlite3.c:102355:8 [opt]
    frame #15: 0x000000010003697c sqlite3`sqlite3_step [inlined] sqlite3Step(p=0x00000001500128c0) at sqlite3.c:91953:10 [opt]
    frame #16: 0x0000000100036824 sqlite3`sqlite3_step(pStmt=0x00000001500128c0) at sqlite3.c:92014:16 [opt]
    frame #17: 0x0000000100367b44 crsqlite.dylib`sqlite_nostd::nostd::ManagedStmt::step::h7b99361de3d2721e + 36
    frame #18: 0x00000001003498a8 crsqlite.dylib`crsql_core::tableinfo::is_table_compatible::hfceff73e55dc8cbe + 160
    frame #19: 0x00000001003623d0 crsqlite.dylib`crsql_core::create_crr::create_crr::hd43ba6eab404f6c1 + 72
    frame #20: 0x000000010035bf60 crsqlite.dylib`crsql_core::x_crsql_as_crr::h691734393cd74737 + 388
    frame #21: 0x000000010007ba70 sqlite3`sqlite3VdbeExec(p=<unavailable>) at sqlite3.c:102707:3 [opt]
    frame #22: 0x000000010003697c sqlite3`sqlite3_step [inlined] sqlite3Step(p=0x00000001500136d0) at sqlite3.c:91953:10 [opt]
    frame #23: 0x0000000100036824 sqlite3`sqlite3_step(pStmt=0x00000001500136d0) at sqlite3.c:92014:16 [opt]
    frame #24: 0x000000010002c19c sqlite3`exec_prepared_stmt(pArg=0x000000016fdfeb58, pStmt=0x00000001500136d0) at shell.c:20845:8 [opt]
    frame #25: 0x000000010001423c sqlite3`shell_exec(pArg=0x000000016fdfeb58, zSql="SELECT crsql_as_crr('t');", pzErrMsg=<unavailable>) at shell.c:21154:7 [opt]
    frame #26: 0x000000010000ad40 sqlite3`main(argc=5, argv=0x000000016fdff470) at shell.c:29380:14 [opt]
    frame #27: 0x000000018da520e0 dyld`start + 2360