kikko-land / kikko

Powerful SQLite adapter for web, mobile and desktop. Build reactive UI on top of it
https://kikko-doc.netlify.app/
MIT License
134 stars 8 forks source link

wa-sqlite performance pains #162

Open alii opened 1 year ago

alii commented 1 year ago

We are using Kikko at work and recently switched to wa-sqlite after finding out absurd-sql was considered deprecated for Kikko, and finding out that wa-sqlite would properly work with multi-tab queries. However, the performance of our application degraded signficantly. I switched back to absurd just to test the performance and it's really noticable.

As an example, a text search across a few pretty large tables (in terms of row count) could take maybe ~2s on Absurd, and in some cases took a few minutes with wa-sqlite.

I suppose for our use case, the two solutions would be to either make multi-tab work properly, or improve the performance of wa-sqlite. I am not sure which one would be more appropriate.

I'd be thrilled to work/contribute on a fix for this if you know of one immediately, or if more discussion is needed then I'd love to help out there, too.

Thank you so much for this wonderful library! 🫶

rhashimoto commented 1 year ago

I can't think of any reason for that much disparity in timings, all things being equal. A good place to start is to check if all things are in fact equal. Can you provide the results of these SQL queries in both environments?

SELECT sqlite_version();
PRAGMA cache_size;
PRAGMA compile_options;
PRAGMA journal_mode;
PRAGMA page_size;
PRAGMA page_count;
PRAGMA synchronous;
PRAGMA temp_store;
EXPLAIN QUERY PLAN <your problem query here>;
alii commented 1 year ago

Ah thank you, I'll have a look and compare when I get back home tonight. I suppose I did assume things should all be the same (including performance wise) which is why I didn't think twice about switching to wa-sqlite. Will let you know :)

alii commented 1 year ago

Hi there, apologies for the wait. Below are results from our application. This comment was also edited to include a query plan for both backends.

absurd-sql:

Option Value
sqlite_version() 3.35.0
cache_size -5000
compile_options COMPILER=clang-15.0.0, DISABLE_LFS, ENABLE_FTS5, ENABLE_JSON1, ENABLE_NORMALIZE, OMIT_LOAD_EXTENSION, THREADSAFE=0
journal_mode memory
page_size 32768
page_count 1769
synchronous 2
temp_store 0
Query Plan
[
  {
    "id": 2,
    "parent": 0,
    "notused": 0,
    "detail": "CO-ROUTINE 8"
  },
  {
    "id": 5,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE 1"
  },
  {
    "id": 12,
    "parent": 5,
    "notused": 0,
    "detail": "SCAN TABLE highlights"
  },
  {
    "id": 14,
    "parent": 5,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 53,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE 2"
  },
  {
    "id": 60,
    "parent": 53,
    "notused": 0,
    "detail": "SCAN TABLE notes"
  },
  {
    "id": 62,
    "parent": 53,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 101,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE 3"
  },
  {
    "id": 108,
    "parent": 101,
    "notused": 0,
    "detail": "SCAN TABLE tagMemberships USING COVERING INDEX sqlite_autoindex_tagMemberships_2"
  },
  {
    "id": 110,
    "parent": 101,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 152,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE 4"
  },
  {
    "id": 159,
    "parent": 152,
    "notused": 0,
    "detail": "SCAN TABLE reactions USING COVERING INDEX sqlite_autoindex_reactions_2"
  },
  {
    "id": 161,
    "parent": 152,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 200,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE 6"
  },
  {
    "id": 202,
    "parent": 200,
    "notused": 0,
    "detail": "CO-ROUTINE 5"
  },
  {
    "id": 211,
    "parent": 202,
    "notused": 0,
    "detail": "SCAN TABLE posts USING INDEX sqlite_autoindex_posts_1"
  },
  {
    "id": 223,
    "parent": 202,
    "notused": 0,
    "detail": "SEARCH TABLE notes USING AUTOMATIC COVERING INDEX (postLocalId=?)"
  },
  {
    "id": 239,
    "parent": 202,
    "notused": 0,
    "detail": "SEARCH TABLE highlights USING AUTOMATIC COVERING INDEX (postLocalId=?)"
  },
  {
    "id": 246,
    "parent": 202,
    "notused": 0,
    "detail": "SEARCH TABLE users USING INDEX sqlite_autoindex_users_1 (id=?)"
  },
  {
    "id": 311,
    "parent": 202,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR DISTINCT"
  },
  {
    "id": 312,
    "parent": 202,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR RIGHT PART OF ORDER BY"
  },
  {
    "id": 334,
    "parent": 200,
    "notused": 0,
    "detail": "SCAN SUBQUERY 5 AS contributorResults"
  },
  {
    "id": 337,
    "parent": 200,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 409,
    "parent": 2,
    "notused": 0,
    "detail": "SCAN TABLE posts"
  },
  {
    "id": 420,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH SUBQUERY 1 AS highlightAgg USING AUTOMATIC COVERING INDEX (postLocalId=?)"
  },
  {
    "id": 436,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH SUBQUERY 2 AS notesAgg USING AUTOMATIC COVERING INDEX (postLocalId=?)"
  },
  {
    "id": 452,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH SUBQUERY 3 AS tagsAgg USING AUTOMATIC COVERING INDEX (postId=?)"
  },
  {
    "id": 468,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH SUBQUERY 4 AS reactionAgg USING AUTOMATIC COVERING INDEX (postId=?)"
  },
  {
    "id": 475,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH TABLE redacted USING INDEX sqlite_autoindex_redacted_1 (id=?)"
  },
  {
    "id": 483,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH TABLE users AS users USING INDEX sqlite_autoindex_users_1 (id=?)"
  },
  {
    "id": 499,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH SUBQUERY 6 AS contributorAgg USING AUTOMATIC COVERING INDEX (postCompositeId=?)"
  },
  {
    "id": 506,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH TABLE contents USING INDEX sqlite_autoindex_contents_1 (id=?)"
  },
  {
    "id": 513,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH TABLE contents AS ownerContents USING INDEX sqlite_autoindex_contents_1 (id=?)"
  },
  {
    "id": 530,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH TABLE readingLogs USING AUTOMATIC COVERING INDEX (contentId=?)"
  },
  {
    "id": 545,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH TABLE contentImportanceMarkers USING AUTOMATIC COVERING INDEX (contentId=?)"
  },
  {
    "id": 551,
    "parent": 2,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 759,
    "parent": 0,
    "notused": 0,
    "detail": "SCAN SUBQUERY 8 AS aggregate"
  }
]

wa-sqlite:

Option Value
sqlite_version() 3.40.1
cache_size 5000
compile_options ATOMIC_INTRINSICS=1, COMPILER=clang-16.0.0, DEFAULT_AUTOVACUUM, DEFAULT_CACHE_SIZE=-2000, DEFAULT_FILE_FORMAT=4, DEFAULT_JOURNAL_SIZE_LIMIT=-1, DEFAULT_MEMSTATUS=0, DEFAULT_MMAP_SIZE=0, DEFAULT_PAGE_SIZE=4096, DEFAULT_PCACHE_INITSZ=20, DEFAULT_RECURSIVE_TRIGGERS, DEFAULT_SECTOR_SIZE=4096, DEFAULT_SYNCHRONOUS=2, DEFAULT_WAL_AUTOCHECKPOINT=1000, DEFAULT_WAL_SYNCHRONOUS=1, DEFAULT_WORKER_THREADS=0, DQS=0, ENABLE_BATCH_ATOMIC_WRITE, LIKE_DOESNT_MATCH_BLOBS, MALLOC_SOFT_LIMIT=1024, MAX_ATTACHED=10, MAX_COLUMN=2000, MAX_COMPOUND_SELECT=500, MAX_DEFAULT_PAGE_SIZE=8192, MAX_EXPR_DEPTH=0, MAX_FUNCTION_ARG=127, MAX_LENGTH=1000000000, MAX_LIKE_PATTERN_LENGTH=50000, MAX_MMAP_SIZE=0, MAX_PAGE_COUNT=1073741823, MAX_PAGE_SIZE=65536, MAX_SQL_LENGTH=1000000000, MAX_TRIGGER_DEPTH=1000, MAX_VARIABLE_NUMBER=32766, MAX_VDBE_OP=250000000, MAX_WORKER_THREADS=0, MUTEX_OMIT, OMIT_AUTOINIT, OMIT_DECLTYPE, OMIT_DEPRECATED, OMIT_LOAD_EXTENSION, OMIT_PROGRESS_CALLBACK, OMIT_SHARED_CACHE, SYSTEM_MALLOC, TEMP_STORE=1, THREADSAFE=0, USE_ALLOCA
journal_mode delete
page_size 32768
page_count 53
synchronous 2
temp_store 0
Query Plan
[
  {
    "id": 2,
    "parent": 0,
    "notused": 0,
    "detail": "CO-ROUTINE aggregate"
  },
  {
    "id": 5,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE highlightAgg"
  },
  {
    "id": 12,
    "parent": 5,
    "notused": 0,
    "detail": "SCAN highlights"
  },
  {
    "id": 14,
    "parent": 5,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 53,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE notesAgg"
  },
  {
    "id": 60,
    "parent": 53,
    "notused": 0,
    "detail": "SCAN notes"
  },
  {
    "id": 62,
    "parent": 53,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 101,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE tagsAgg"
  },
  {
    "id": 108,
    "parent": 101,
    "notused": 0,
    "detail": "SCAN tagMemberships USING COVERING INDEX sqlite_autoindex_tagMemberships_2"
  },
  {
    "id": 110,
    "parent": 101,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 152,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE reactionAgg"
  },
  {
    "id": 159,
    "parent": 152,
    "notused": 0,
    "detail": "SCAN reactions USING COVERING INDEX sqlite_autoindex_reactions_2"
  },
  {
    "id": 161,
    "parent": 152,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 200,
    "parent": 2,
    "notused": 0,
    "detail": "MATERIALIZE contributorAgg"
  },
  {
    "id": 202,
    "parent": 200,
    "notused": 0,
    "detail": "CO-ROUTINE contributorResults"
  },
  {
    "id": 211,
    "parent": 202,
    "notused": 0,
    "detail": "SCAN posts USING INDEX sqlite_autoindex_posts_1"
  },
  {
    "id": 225,
    "parent": 202,
    "notused": 0,
    "detail": "SEARCH notes USING AUTOMATIC COVERING INDEX (postLocalId=?) LEFT-JOIN"
  },
  {
    "id": 244,
    "parent": 202,
    "notused": 0,
    "detail": "SEARCH highlights USING AUTOMATIC COVERING INDEX (postLocalId=?) LEFT-JOIN"
  },
  {
    "id": 252,
    "parent": 202,
    "notused": 0,
    "detail": "SEARCH users USING INDEX sqlite_autoindex_users_1 (id=?) LEFT-JOIN"
  },
  {
    "id": 319,
    "parent": 202,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR DISTINCT"
  },
  {
    "id": 320,
    "parent": 202,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR RIGHT PART OF ORDER BY"
  },
  {
    "id": 342,
    "parent": 200,
    "notused": 0,
    "detail": "SCAN contributorResults"
  },
  {
    "id": 345,
    "parent": 200,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 417,
    "parent": 2,
    "notused": 0,
    "detail": "SCAN posts"
  },
  {
    "id": 432,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH highlightAgg USING AUTOMATIC COVERING INDEX (postLocalId=?) LEFT-JOIN"
  },
  {
    "id": 453,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH notesAgg USING AUTOMATIC COVERING INDEX (postLocalId=?) LEFT-JOIN"
  },
  {
    "id": 474,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH tagsAgg USING AUTOMATIC COVERING INDEX (postId=?) LEFT-JOIN"
  },
  {
    "id": 495,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH reactionAgg USING AUTOMATIC COVERING INDEX (postId=?) LEFT-JOIN"
  },
  {
    "id": 503,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH redacted USING INDEX sqlite_autoindex_redacted_1 (id=?) LEFT-JOIN"
  },
  {
    "id": 511,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH users USING INDEX sqlite_autoindex_users_1 (id=?) LEFT-JOIN"
  },
  {
    "id": 531,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH contributorAgg USING AUTOMATIC COVERING INDEX (postCompositeId=?) LEFT-JOIN"
  },
  {
    "id": 539,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH contents USING INDEX sqlite_autoindex_contents_1 (id=?) LEFT-JOIN"
  },
  {
    "id": 546,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH ownerContents USING INDEX sqlite_autoindex_contents_1 (id=?) LEFT-JOIN"
  },
  {
    "id": 565,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH readingLogs USING AUTOMATIC COVERING INDEX (contentId=?) LEFT-JOIN"
  },
  {
    "id": 583,
    "parent": 2,
    "notused": 0,
    "detail": "SEARCH contentImportanceMarkers USING AUTOMATIC COVERING INDEX (contentId=?) LEFT-JOIN"
  },
  {
    "id": 590,
    "parent": 2,
    "notused": 0,
    "detail": "USE TEMP B-TREE FOR GROUP BY"
  },
  {
    "id": 810,
    "parent": 0,
    "notused": 0,
    "detail": "SCAN aggregate"
  }
]
rhashimoto commented 1 year ago

The query plans look pretty close - there are some differences but that could be just reporting changes from the SQLite versions.

The first thing I would try is explicitly setting PRAGMA temp_store=2 to keep any materialized tables or indexes in memory. I think both libraries have the same default temp_store compiled in, but it would be good to be sure where those temporary files are.

Your cache_size for wa-sqlite is much higher than absurd-sql (5000 pages vs. 5000 KB), so that won't explain why wa-sqlite is slower. Does this query only read from the database and not change it? If so, then the database fits into the cache so we can probably rule out VFS performance as the cause as few VFS calls would be made once the cache is populated.

I think the two remaining differences of note would be (1) Asyncify and (2) SQLite version. kikko uses an Asyncify build of wa-sqlite (required because it uses the asynchronous VFS IDBBatchAtomicVFS). The Asyncify transformation makes some code paths bigger and slower, though in practice most database applications are I/O bound and the increased CPU load is a minor fraction of overall performance. The worst case I would expect is 5x slower, but your application may be an exception, especially if you often execute computationally expensive queries completely from cache.

The other possibility is that SQLite may have had a performance regression that affects your application between versions 3.35 and 3.40 (the current version is now 3.42). It isn't that uncommon to see performance regressions reported (and usually promptly fixed) in the SQLite forum, though I'm not sure what fraction wouldn't show up in the query plan. If you're able to export your database out of the browser and run your query on the corresponding SQLite versions on a desktop computer, that would be a way to check this explanation.

alii commented 1 year ago

Thank you so much for this super detailed & helpful response :)

Firstly, all queries (read and/or writes) seem to be slower however the query plan I provided is for a read only query.

Secondly, we do definitely have expensive queries however I'd be unsure if they come from SQLite's cache or not. Unfortunately I'm not able to share our queries publicly (especially our biggest, most computationally expensive) but if it would be useful I could possibly provide them to you on Discord/email/Twitter etc that would be really helpful 🫶

I will have a go at exporting our databases and testing performance in the differing versions. Will post again here once I have some results.

Thank you once again!

alii commented 9 months ago

We've unfortunately had to de-prioritise this at work for a few months to focus on other tasks, but I've come back to it and have some new findings.

I have rebuilt the latest version of wa-sqlite and very slightly modified the wa-sqlite kikko backend but it seems that the performance issues actually are stemming from VFS performance (as far as I can tell). I think this because I enabled all of the debug log calls and I repeatedly see many many many calls to xRead and logs for get blocks. We are using IDBBatchAtomicVFS and I considered switching to another VFS, but given that wa-sqlite names it as the the "featured" impl for benchmarks, I thought it would probably make sense to keep it (https://github.com/rhashimoto/wa-sqlite/tree/master/src/examples#idbbatchatomicvfs)

Setting PRAGMA temp_store to 2 actually does seriously improve (and possibly completely solve) our troubles, but I still think the VFS performance can be improved. The information below is with temp_store set to 0

Initially this looked like an infinite loop but the query actually does resolve after ~130 seconds (which is similar to what I had described in my original post).

I also recorded a short video to demonstrate how fast these calls are happening - https://www.youtube.com/watch?v=y4THd0DcohU

Option Value
sqlite_version() 3.44.0
cache_size 5000
compile_options ATOMIC_INTRINSICS=1, COMPILER=clang-18.0.0, DEFAULT_AUTOVACUUM, DEFAULT_CACHE_SIZE=-2000, DEFAULT_FILE_FORMAT=4, DEFAULT_JOURNAL_SIZE_LIMIT=-1, DEFAULT_MEMSTATUS=0, DEFAULT_MMAP_SIZE=0, DEFAULT_PAGE_SIZE=4096, DEFAULT_PCACHE_INITSZ=20, DEFAULT_RECURSIVE_TRIGGERS, DEFAULT_SECTOR_SIZE=4096, DEFAULT_SYNCHRONOUS=2, DEFAULT_WAL_AUTOCHECKPOINT=1000, DEFAULT_WAL_SYNCHRONOUS=1, DEFAULT_WORKER_THREADS=0, DQS=0, ENABLE_BATCH_ATOMIC_WRITE, LIKE_DOESNT_MATCH_BLOBS, MALLOC_SOFT_LIMIT=1024, MAX_ATTACHED=10, MAX_COLUMN=2000, MAX_COMPOUND_SELECT=500, MAX_DEFAULT_PAGE_SIZE=8192, MAX_EXPR_DEPTH=0, MAX_FUNCTION_ARG=127, MAX_LENGTH=1000000000, MAX_LIKE_PATTERN_LENGTH=50000, MAX_MMAP_SIZE=0, MAX_PAGE_COUNT=1073741823, MAX_PAGE_SIZE=65536, MAX_SQL_LENGTH=1000000000, MAX_TRIGGER_DEPTH=1000, MAX_VARIABLE_NUMBER=32766, MAX_VDBE_OP=250000000, MAX_WORKER_THREADS=0, MUTEX_OMIT, OMIT_AUTOINIT, OMIT_DECLTYPE, OMIT_DEPRECATED, OMIT_LOAD_EXTENSION, OMIT_SHARED_CACHE, SYSTEM_MALLOC, TEMP_STORE=1, THREADSAFE=0, USE_ALLOCA
journal_mode delete
page_size 32768
page_count 2147
synchronous 2
temp_store 0

I'm not really sure what the next steps are for debugging this, but totally willing to try anything. It also might make sense to move this issue to the wa-sqlite repo, given that it's not really kikko related.

Thank you so much once again

rhashimoto commented 9 months ago

The VFS logs you provided show reads made from temporary files, not the main database (the filename is random, and the reads are 4K blocks). The video shows that reads are skipping around, i.e. the file isn't simply read sequentially in a single pass. I can't tell if it's making multiple passes through the whole file or if it's jumping around to different offsets more randomly. If it's the latter than you might be thrashing.

Generally speaking, you want to keep indexes and temporary files in memory as much as possible. The cache for temporary files is set at compile time to 2 MB - perhaps tuning that would help if you can't avoid writing them to the VFS altogether.

You haven't really shown me anything yet that makes me suspect a wa-sqlite bug, so I'm only able to suggest general SQLite optimization strategies...and that's not something I'm particularly expert in, especially for the complicated queries you have.