orbitinghail / sqlsync

SQLSync is a collaborative offline-first wrapper around SQLite. It is designed to synchronize web application state between users, devices, and the edge.
https://sqlsync.dev
Apache License 2.0
2.19k stars 28 forks source link

query subscriptions #23

Closed carlsverre closed 9 months ago

carlsverre commented 10 months ago

Need to plumb query subscriptions into SQLSync core rather than hacking it at the top level using events.

carlsverre commented 10 months ago

Detecting table changes in SQLite from a page perspective

First, SQLSync can trivially track which pages are changed by each mutation by coordinating with storage and the underlying storage journal. This gives us a "changed page set".

For now, we just want to know when a table has changed - not the precise pages that changed in it. So, we need to map the "changed page set" to a "changed root page set".

This can be achieved (in a sub-optimal way) using the dbstat table:

sqlite> select * from dbstat;
name           path   pageno  pagetype  ncell  payload  unused  mx_payload  pgoffset  pgsize
-------------  -----  ------  --------  -----  -------  ------  ----------  --------  ------
sqlite_schema  /      1       leaf      4      192      3780    55          0         4096  
foo            /      2       internal  52     0        3644    0           4096      4096  
foo            /000/  4       leaf      602    1204     1       2           12288     4096  
foo            /001/  5       leaf      584    1168     0       2           16384     4096  
foo            /002/  6       leaf      584    1168     0       2           20480     4096  
foo            /003/  7       leaf      584    1168     0       2           24576     4096  
foo            /004/  8       leaf      584    1168     0       2           28672     4096  
foo            /005/  9       leaf      584    1168     0       2           32768     4096  
foo            /006/  10      leaf      584    1168     0       2           36864     4096  
foo            /007/  11      leaf      584    1168     0       2           40960     4096  
foo            /008/  12      leaf      584    1168     0       2           45056     4096
...

But a better solution is to enable incremental vacuum which will cause SQLite to maintain ptrmap pages. Storage only needs to read these pages to fully map every changed page back to its root. This is much more optimal than dbstat which scans every page (although it does support name level filter pushdown).

Now that we know which table roots have changed we simply need to track which roots a query depends on. We can use explain to do this:

sqlite> explain select * from foo,foo2,bar where foo.id > 1;
addr  opcode       p1  p2  p3  p4      p5  comment              
----  -----------  --  --  --  ------  --  ---------------------
0     Init         0   16  0           0   Start at 16          
1     OpenRead     4   57  0   k(2,,)  0   root=57 iDb=0; foo_id
2     OpenRead     3   2   0   1       0   root=2 iDb=0; foo    
3     OpenRead     2   3   0   1       0   root=3 iDb=0; bar    
4     Integer      1   1   0           0   r[1]=1               
5     SeekGT       4   15  1   1       0   key=r[1]             
6     Rewind       3   15  0           0                        
7     Rewind       2   15  0           0                        
8     Column       4   0   2           0   r[2]=foo.id          
9     Column       3   0   3           0   r[3]=foo.id          
10    Column       2   0   4           0   r[4]=bar.id          
11    ResultRow    2   3   0           0   output=r[2..4]       
12    Next         2   8   0           1                        
13    Next         3   7   0           1                        
14    Next         4   6   0           0                        
15    Halt         0   0   0           0                        
16    Transaction  0   0   4   0       1   usesStmtJournal=0    
17    Goto         0   1   0           0                        

Notice that the three OpenRead operations give us all the information we need to know that this query depends on changes to the btrees rooted at page 57, 2, and 3. (we can ignore table names).