Poor Performance with INSERT statements

thorsten-wolf-neptune commented 8 months ago

Hey all. First shout out to the maintainors of the sqlite wasm solution. This is a lifesavior for our current websql implementation. ❤️ Because we require persisted data, we implemented it successfully by following the best practice option (using a wrapped worker) together with OPFS so we have the sqlite client @magieno/sqlite-client together with this library in our environment.

Everything works pretty smoothly however the performance when inserting data is very poor when compared to websql. I forked this repo and created an example where i put a real world example of our useage of SQLITE that would insert ~60MB of data into the database (427952 rows into table 31 columns) that you can finde here: https://github.com/sqlite/sqlite-wasm/pull/60 This PR is not supposed to actually be merged into this main repo but to give an easy overview of my insert demo that relates to this issue. Enyone who is interested or wants to replicate my demo can pull my branch and run npm run start and then open http://127.0.0.1:8080/demo-insert-performance/ in your browser.

In Websql this example can be inserted in ~2 seconds. When OPFS comes into play the new inserting takes crazy long for the same statements.

The wrapped worker demo is ~200 seconds
The worker demo is ~200 seconds
The Main thread demo is ~5 seconds which is fine i think

Some interesting findings i already have:

When running the same test in a chrome ingocnito mode the performance is a lot better. The worker OPFS examples are all around ~30 seconds (in Microsoft edge ~seconds) consistently even if i rerun the demo again and again.
When i add a pragma to avoid a journal also be written to OPFS (PRAGMA journal_mode = MEMORY; or PRAGMA journal_mode = OFF;) the runtimes of the OPFS demos goes down to ~17 seconds. Ingocnito run of this example is pretty much the same, maybe 2 sec less.
The results are pretty much identical with the Microsoft Edge browser when i run the different variations of tests

So my question to the community is: Do you have the same experience with the SQLite WASM OPFS approach when it comes to inserting a lot of data? If so does anyone have a solution to improve the insert performance?

My PC i run the tests on is quite beefy: OS: Windows 11 Home 23H2 CPU: AMD Ryzen 9 7950X 16-Core Processor 4.50 GHz RAM: 32,0 GB Google Chrome Version: 121.0.6167.140 (Official Build) (64-bit) Microsoft Edge: 121.0.2277.98 (Offizielles Build) (64-Bit) SSD: Samsung SSD 980 PRO 2TB

This is a real problem for us and our customers. Any help is very very very much appreciated.

koramstad commented 8 months ago

I'm not sure how transactions is handled in sqlite-wsam, but will this help?

db.exec('BEGIN TRANSACTION;'); ... do the inserts ... db.exec('COMMIT;');

See also: Stackowerflow Improve INSERT-per-second performance of SQLite

thorsten-wolf-neptune commented 8 months ago

Good point! I will give that a try.

thorsten-wolf-neptune commented 8 months ago

I'm not sure how transactions is handled in sqlite-wsam, but will this help?

db.exec('BEGIN TRANSACTION;'); ... do the inserts ... db.exec('COMMIT;');

See also: Stackowerflow Improve INSERT-per-second performance of SQLite

that unfortunately doesn't improve it really. I did a new push (526a355be91a8f51d5fd07f93275b240ef629741) to this branch with the updated version.

I now also tried with a db.prepare statement instead of a raw sql command where i put all the values with a statement.bind. This approach probably won't work that easy with the wrapped-worker example because an instance of the statement prepared would need to be kept across multiple calls of the main thread to the worker.

However i created an example with the worker and the main_thread. You can test that here: http://127.0.0.1:8080/demo-insert-prepare-performance/

I got some new numbers with that approach: In most of our scenarios we cannot delete all data upfront but i we need to do an UPSERT (INSERT OR REPLACE) in that case the runtime goes down is ~170 sec. That's a little better but still very far away from being performant.

In scenarios where i do a full data insert (deleting the table upfront and only doing INSERT statements) i do get ~40 seconds. But that won't be suitable for many scnearios as we have mostly delta updates of the records.

Maybe the journal_mode can be really set to MEMORY or OFF? i don't know anything about the inside of SQLite but if the insert or replace command really would fail would it really corrupt the database? Maybe it is pretty safe to set it to MEMORY?

I always thought that WebSQL was also a special version of SQLite behind the scenes so i don't understand this big difference in performance in general. As i wrote in the beginning with the old websql technology we were able to UPSERT with lightning speed without any performance problems.

sgbeal commented 8 months ago

Passing large amounts of data through postMessage will always be slow - it's unavoidable. The worker1 API is most definitely not an ideal way of working with the API, for more reasons than that one. Loading sqlite into your own code's thread (main or worker) is, by far, the most performant and flexible approach. WebSQL was (A) native code (not wasm) and (B) ran in the main thread, so completely lacked postMessage overhead. OPFS's use of async APIs makes it especially burdensome to integrate with sqlite (which is 100% synchronous) and that integration has its own overhead. See https://sqlite.org/wasm/doc/trunk/persistence.md for options, including a second OPFS VFS which trades concurrency for performance.

(pardon brevity - mobile device)

thorsten-wolf-neptune commented 8 months ago

Thanks for charming in @sgbeal and thanks for your fast response :-)

Yes makes sense that the whole worker bridging brings additional overhead and during my investigation i thought at first that this is the main bottleneck. However the worse performance would also happen in a complete isolated worker example without any cross thread communication (see demo with worker.js).

So maybe it is then really related to the WASM compilation.

But what's then pretty strange to me is the fact that in a browsers ingocnito mode the performance is pretty good.

Can it be related to the file accessing of the database file at the very core of the OPFS and WASM SQLite approach?

Could one think of a solution where there is a "sync to main thread" where the .sqlite3 file state is synced to the main thread, then everything is performed in memory in and then after that one could have a "sync back to OPFS" to write the complete .sqlite3 file back?

Of course the application would need to handle a lock mechanism on it's own to ensure no one else accesses the sqlite database in that process.

Sorry if my thoughts are too naive but i am just thinking out loud 😅

sgbeal commented 8 months ago

OPFS is a black box with at least 3 separate implementations (Chrome, Firefox, Safari). Some environments are faster, some are slower, and often the reason for it is known only to the deepest bits in OPFS's code. Anyone claiming that OPFS will consistently have near-native performance across all environments is being overly-optimistic. It is fast, generally speaking, but that's very dependent on the environment and browser version, as well as the workload. There is unfortunately neither rhyme nor reason, from the perspective of client-level code (like sqlite), to explain why it's fast in some environments and not others. Even across different Chrome versions on the same OS, performance can vary significantly (most notably in VACUUM operations).

Worker: the OPFS VFS requires its own dedicated worker in order to work around the OPFS API having async members (which cannot be mixed with 100% synchronous code), and that is where much, possibly most, of the performance hit happens. The second OPFS VFS works around that but has to sacrifice all concurrency to do so (but tends to be about 2-3x faster).

Synching of a db across threads is far out of scope for the library-level code and is not a rabbit hole i've explored. It's certainly possible to do.

Locking: OPFS currently exclusively locks files when they are opened (and working around that, to provide some level of concurrency, is part of the performance hit for the first OPFS VFS). Chrome (only) has experimental support for more fine-grained locking which we will eventually be able to make use of, but it's currently experimental and only in Chrome, so we have not yet explored it.

You are not being naive at all.

(pardon brevity - still on a mobile device).

thorsten-wolf-neptune commented 8 months ago

ok thanks a lot for the very good explanation. I guess then it is fair to say that there are some hidden things happening behind the scenes that we cannot really unfluence. Do you have any recommendations we can investigate to boost performance any further (other than disabling the journal) or do we need to say that with the current tech stack there is simply a hard limit when it comes to insert performance and it is what it is.

Thanks again for your insights :-)

sgbeal commented 8 months ago

I guess then it is fair to say that there are some hidden things happening behind the scenes that we cannot really unfluence.

Precisely. Unfortunately, folks often tend to assume that OPFS access will (A) be as fast as storage is in native applications and (B) that there are not at least half a dozen abstractions between the JS code and the storage to slow it down.

To be clear, though: OPFS's API, when used directly, is actually surprisingly fast, but (A) our requirement of "hiding" it behind a synchronous interface introduces a significant performance hit and (B) certain usage patterns (with or without our sync/async proxy) my slow it down for black-box reasons which are implementation- and version-specific. In our case, this is best demonstrated with a VACUUM operation (and that particular case performs particularly poorly on our first OPFS VFS because its high rate of bi-directional I/O is throttled by the concurrency-related mechanisms).

Do you have any recommendations we can investigate to boost performance any further (other than disabling the journal) or do we need to say that with the current tech stack there is simply a hard limit when it comes to insert performance and it is what it is.

The hard limit will always largely be imposed by the device. Mobile device storage is slow in comparison to desktop storage and every mobile device i've ever owned gets ultra laggy if any I/O is going on. If performance is your only goal, and you do not need to access your db from multiple tabs concurrently, the 2nd OPFS VFS will serve you better:

https://sqlite.org/wasm/doc/trunk/persistence.md#vfs-opfs-sahpool

It's much faster than the first one but has notable tradeoffs, most significantly a complete lack of concurrency support. You can compare them side-by-side using the sqlite project's standard I/O-heavy benchmarking took here:

https://wasm-testing.sqlite.org/

try the "speedtest1" links, each of which runs a different VFS.

If, however, you need some degree of concurrent access (noting that we cannot currently support more than a few concurrent accesses at a time), the first OPFS VFS is your only choice (in this project) and it will be a performance bottleneck for apps with large amounts of data. You might also want to explore the wa-sqlite project, a third-party effort which takes a much different approach to VFSes than we do.

rhashimoto commented 8 months ago

Maybe the journal_mode can be really set to MEMORY or OFF? i don't know anything about the inside of SQLite but if the insert or replace command really would fail would it really corrupt the database? Maybe it is pretty safe to set it to MEMORY?

No, it is generally not safe. Except on some esoteric filesystems, If a failure occurs partway through a write transaction without a persistent journal or log, data corruption is likely. This might be tolerable for a particular application that is designed to restore from other sources or reset to a new empty database, but otherwise this is extremely risky.

But what's then pretty strange to me is the fact that in a browsers ingocnito mode the performance is pretty good.

Can it be related to the file accessing of the database file at the very core of the OPFS and WASM SQLite approach?

In private browsing mode, aka Incognito, the browser doesn't retain storage once the context exits and storage limits are typically small. The browser implementation could put OPFS files in memory, or on the device filesystem but without bothering to flush writes because nothing needs to be preserved in case of a crash. That's my guess why Incognito would be faster.

thorsten-wolf-neptune commented 7 months ago

Hi all, just to let anyone interested know: i created a PR for the sqlite-client library that we currently use to allow different storage variations (OPFS, OPFS_SAH, MEMORY). I described the change here https://github.com/magieno/sqlite-client/issues/2 In my first tests the Opfs SAH approach indeed seems to be a lot faster than the plain opfs.

zirill commented 6 months ago

      let ssql = "";
      ssql += "BEGIN TRANSACTION; insert into t(a,b) values ";
      let bb = false;
      for( i = 0; i <= 1000000; ++i ){
        if (bb){ssql += ","}else{bb = true};
        ssql += "("+i+","+(i*2)+")";

      }
      ssql += ";COMMIT;"
      console.log('LEN: ',ssql.length);

      let a = performance.now();
      db.exec2({
        sql: ssql
      });
      let b = performance.now();
      console.log('PERF: ',(b-a).toFixed(2),'ms');

LEN: 16333405 PERF: 4249.02 ms (insert 1kk row)

      let ssql2 = "";
      ssql2 += "BEGIN TRANSACTION; ";
      for( i = 0; i <= 10000; ++i ){
        ssql2 += "insert into t(a,b) values  ("+i+","+(i*2)+");";

      }
      ssql2 += "COMMIT;"
      console.log('LEN: ',ssql2.length);

      let a2 = performance.now();
      db.exec2({
        sql: ssql2
      });
      let b2 = performance.now();
      console.log('PERF: ',(b2-a2).toFixed(2),'ms');

LEN: 393402 PERF: 5314.36 ms (insert 10k row)

sgbeal commented 6 months ago

Are your timings for just the string creation? If so, you may be able to speed those up by instead using an array to push() each string part to, and then join() the string at the end. i don't know whether this is still the case, but JS engines used to be much faster at creating large strings when using an intermediary array. String concatenation is, for platforms with immutable strings, rather expensive. Using an intermediary array, holding all of the parts of the string, gives the engine a way to optimize the concatenation in a single step, possibly even with a single alloc, rather than re-allocating and re-copying the strings pieces for every single concatenation.

That is:

['a','b','c'].join()

is, for large sets of string pieces, generally more performant than:

'a' + 'b' + 'c'

because the latter breaks down to:

'a'
==> 'a'
+ 'b'
==> 'ab'
+ 'c'
==> 'abc'

which requires a great deal more memory copying when using a non-trivial number of concatenations.

zirill commented 6 months ago

Are your timings for just the string creation?

Slow insertion is discussed here, I gave an example of the difference between queries and their performance.

PS: updated the code above.

tomayac commented 4 months ago

I'm closing this, since there's nothing actionable for the wrapper library (but maybe for the core SQLite library). The discussion can continue in the closed Issue. Same for the accompanying draft PR.

chvp commented 1 month ago

Turns out you can still support multiple tabs using the same database with the OpfsSAH VFS by using either a shared worker or designating one of the workers as a leader (using e.g. https://github.com/pubkey/broadcast-channel to help with electing this leader). This obviously doesn't give true concurrent access (since all queries are performed in a single worker), but at least there is no limitation on the amount of tabs that users can open.

steida commented 1 month ago

@chvp You don't need a third-party lib, here is a very minimal implementation: https://github.com/evoluhq/evolu/blob/main/packages/evolu-common-web/src/SqliteFactoryLive.ts

sqlite / sqlite-wasm

Poor Performance with INSERT statements #61