sqlite / sqlite-wasm

SQLite Wasm conveniently wrapped as an ES Module.
520 stars 46 forks source link

JS and WASM payload size #55

Closed tantaman closed 9 months ago

tantaman commented 9 months ago

The JS payload is surprisingly large for the official SQLite build.

E.g., https://github.com/sqlite/sqlite-wasm/blob/main/sqlite-wasm/jswasm/sqlite3-bundler-friendly.mjs is 469KB where as a debug build of https://github.com/rhashimoto/wa-sqlite is 229KB unminified.

The WASM payload is also rather large, 897KB (https://github.com/sqlite/sqlite-wasm/blob/main/sqlite-wasm/jswasm/sqlite3.wasm) vs 566KB (https://github.com/rhashimoto/wa-sqlite/blob/master/dist/wa-sqlite.wasm)

On the JS front, I believe it is because the build includes a nodejs target as well as browser target:

https://github.com/sqlite/sqlite-wasm/blob/cb1ab39435f664ac0308ab6df5fbf1d820e9c34f/sqlite-wasm/jswasm/sqlite3-bundler-friendly.mjs#L99-L102.

These should be separate targets so we don't have to send and parse so much JS in the browser.

package.json would then decide the correct target to load:

https://github.com/sqlite/sqlite-wasm/blob/cb1ab39435f664ac0308ab6df5fbf1d820e9c34f/package.json#L25-L30

On the WASM front, I'm not sure. Maybe a matter of Oz vs O3? I guess at this point we're getting far afield and into needing to allow the user to specify build parameters.

sgbeal commented 9 months ago

On the JS front, I believe it is because the build includes a nodejs target as well as browser target:

The amount of stuff which Emscripten injects is, in the overall scheme of things, relatively small. The overwhelming majority of the JS files are upstream content and very nearly half of those JS bytes are documentation, which we do not strip from the output files:

$ l jswasm/sqlite3-bundler-friendly.mjs
-rw-rw-r-- 1 stephan stephan 721648 Dec 14 23:00 jswasm/sqlite3-bundler-friendly.mjs
$  stripcomments < jswasm/sqlite3-bundler-friendly.mjs | wc 
  13074   33921  407516

Clients are free to strip and compress those as they like. That comment-stripped output, when gzipped, is only 89kb.

On the WASM front, I'm not sure. Maybe a matter of Oz vs O3?

We create the canonical release builds with Oz. Why Roy's wasm binaries are notably smaller, i've no clue.

I guess at this point we're getting far afield and into needing to allow the user to specify build parameters.

The canonical upstream tree supports a good deal of customization, none of which is available via the npm build.

rhashimoto commented 9 months ago

We create the canonical release builds with Oz. Why Roy's wasm binaries are notably smaller, i've no clue.

I think the SQLite features enabled are different in the two builds. For example, I don't enable full text search and OSW does.

tantaman commented 9 months ago

@sgbeal -

that comment-stripped output, when gzipped, is only 89kb.

wa-sqlite (and cr-sqlite which is just a small fork of it) is just 18K gzipped.

Screenshot 2023-12-21 at 9 50 57 AM

If the bloat was WASM bytes it wouldn't really matter since WASM size != JS size.

JS, unfortunately, has quite a cost for each KB in terms of parsing and compiling especially on low end devices.

For reference: https://medium.com/@addyosmani/the-cost-of-javascript-in-2018-7d8950fbb5d4#b71f

I haven't profiled the startup time of wa-sqlite vs official sqlite on mobile devices but I'd expect the official build to be quite a bit slower to start.

sgbeal commented 9 months ago

wa-sqlite (and cr-sqlite which is just a small fork of it) is just 18K gzipped.

They also provide a single API, whereas we provide four different layers of APIs: raw WASM exports, WASM exports with type conversion, object-oriented, and "worker1". In addition, there's framework-level additions like our re-implementation of most of the Emscripten-provided glue (29k w/o comments) to keep us independent of Emscripten where at all possible (it only provides the initial loading, a step we can't replace because it provides the WASM imports).

Given the project's strong backwards compatibility constraints, we're not going to be removing any significant amount of code (we also don't foresee the addition of huge amounts of new code, though we have two feature sets planned for addition as time and energy allow).

We very specifically do not intend our JS/WASM build to be "the" solution, just one of any number of solutions, and we encourage folks dissatisfied with it to find alternative solutions or build their own which fit their needs better. We can't be all things to all clients.

That said: our build process is set up to make it relatively easy to strip out unnecessary parts, or inject new ones, for folks who want to customize without creating a whole new library. A quick glance shows that 108kb (43k w/o docs) could be stripped by removing the OO1 and Worker1 parts.

tantaman commented 9 months ago

Thanks @sgbeal -- I appreciate the help.

tantaman commented 9 months ago

although @sgbeal on:

Given the project's strong backwards compatibility constraints, we're not going to be removing any significant amount of code

I get that for a stable product but SQLite WASM is not stable and is still exploratory. I think it is a mistake to tie yourself to backwards compatibility while you're still iterating on the API design and differnt implementation routes (JSPI, SharedArrayBuffer, SAHPool, Promiser, OO1,...).

I don't know the history of SQLite but I really doubt backwards compatibility was maintained while Richard was still trying to get things right. That's why we're at SQLite3 now and not SQLite1 still, right?

sgbeal commented 9 months ago

I think it is a mistake to tie yourself to backwards compatibility while you're still iterating on the API design and differnt implementation routes

It's not my decision ;). Within the project it was initially agreed that we would have 1, possibly 2, release cycles for the JS API before labeling it as stable, with all that that entails. i freely admit that the SAHPool implementation was somewhat rushed, but (A) we had more confidence about that particular bit because it was based entirely on Roy's already-proven work and (B) another side-project (the upcoming JNI bindings) intersected with the tail end of that (bad timing) and took priority for project-internal reasons. (Sidebar: despite the rush, SAHPool has worked out nicely, IMO, for which i'm immensely grateful.)

That said: the "oo1" and "worker1" APIs very specifically got the "1" suffix to imply the potential addition of incompatible/unrelated oo2..ooN APIs in future iterations, with no compatibility constraints vis-a-vis the "1"-suffix APIs. The C-style APIs are mostly thin wrappers around the raw C bindings, and have the same compatibility constraints as those, but JS's flexible argument handling gives us lots of room for extending the arg-handling semantics of those as needed without breaking backwards compatibility. As time and energy permit, my plan is to extend argument handling in some pieces, like sqlite3_create_function(), to enable more JS-esque usages, applying usability learnings from the past year and the JNI bindings.

Of course, adding the hypothetical oo2..ooN would further increase the JS blob size, but at that point we could (and most definitely should) offer separate builds which provide one or the other (or neither, for that matter). Reworking the build process to simplify that sort of customization is on my TODO list (prompted by this ticket, FWIW), moving much of the current build implementation (in GNU Make) into easier-to-read/maintain shell or TCL scripts.

That's why we're at SQLite3 now and not SQLite1 still, right?

That's not wrong, certainly, but the initial attempt to go to sqlite4 (in 2012?) ended up being aborted and there's currently no end in sight for v3 ;).

randName commented 9 months ago

pardon the reply on a closed thread, but I've been working on a tree-shakable version over at https://github.com/musakui/sqlite-wasm/tree/jenga if you are interested

progress will be slower than the last few days since I'm back to work, but I'm also beginning to hit the point where my demo doesn't cover the code I'm working on, so I will need to look for a test suite to make sure my changes don't break anything

cc @steida since you opened a related issue