sql-js / sql.js

A javascript library to run SQLite on the web.
http://sql.js.org
Other
12.52k stars 1.05k forks source link

sql.js is almost 2 OOM slower than WebSQL on the same query (identical set of tables, rows, views, indices, hw, browser) #225

Closed monfera closed 6 years ago

monfera commented 6 years ago

Adding some input relating to this closed item; it can be immediately closed as there aren't many things to try, but maybe someone has an idea of the cause, or helps spot some perf issue, or at least I could learn how others use sql.js when it appears to be really slow when there are a few dozen related tables/views, even with only 150 input rows.

I ended up implementing some interactive visualization in pure SQL (sans a single input and DOM element update) where there are about 5 input tables (one with 150 rows, the other 4 are with 1..4 rows), and a few dozen views that build on these four tables and on each other, using all kinds of basic things (groupBy, union, join etc. but nothing fancy computationally). There's one final query whose result is used to output the result - redraws 150 circles on the screen (rendering is negligible time cost).

The calculations generate an X and Y scale for this prototype scatterplot, such that the rectangular area (whose size is the input, a.k.a. responsive design) is optimally filled with the circles, but no circle gets truncated (it can be seen on the image that the circles at the edges touch the image border):

scatter

Initial result: one render taking around one minute.

Subsequent tweaks:

These changes resulted in an update time of 1.5 seconds. explain query plan wasn't a big help for two reasons:

I then thought, why not try it with WebSQL, which is about the same SQLite but runs native. I got somewhat varying frame rates, typically between 15..20FPS, sometimes 30FPS.

==========

In sum, the same exact set of views updated in 1500ms with sql.js and in 50ms with WebSQL, though I could add an analyze to sql.js but not to WebSQL.

So this is 1-2 orders of magnitude slower speed, quite unexpected, especially because the sql.js version benefits from not having to go through a browser API boundary (though it's a minor point as the single final query just retrieves 150 rows with 5 numeric columns).

Note: even the 50ms update is quite slow, as a woefully non-optimized plain JS version, which just materializes all views (corresponding to pretty much the same queries with a winged multiset relational algebra DAG), and rebuilds fully if invalidated, runs in a fraction of that time. Also, that 50ms was achieved at the expense of 1) quite heavy tweaking, esp. selecting into tables rather than views where sensible; 2) reducing the generality, because now only one of the 5 tables can accept insert/update changes; if I retain full DAG recalc flexibility ie. all but the 5 tables are views, then the numbers are even farther:

simple, fully recalculating DAG in JS: ~10ms < WebSQL: ~150ms < sql.js: 9400ms

Now I wonder about real use cases where there may be more than 150 rows and mildly nontrivial views (or compound selects) where people find sql.js sufficiently fast. It's a weighty dependency to consider (2.6MB) so I'm interested in if, and how folks are using it.

lovasoa commented 6 years ago

It's hard to say anything without actually seeing the code. But indeed, sql.js is slower than anything native, and handwritten code is usually faster than using even a native SQL implementation.

monfera commented 6 years ago

Thanks @lovasoa I'll try to brew a minimal set of queries that shows the difference without less relevant domain-specific content, although as you suggest, probably not much can be done about it in sql.js, and unfortunately WebSQL is not a cross-browser thing.

monfera commented 6 years ago

... also, with upcoming Spectre / Meltdown bug fixes, there may be negative impact on asm.js / WASM

lovasoa commented 6 years ago

By the way, I forgot to ask, but do you use prepared statements ? They improve performance when you have to run the same queries several times. Do you have primary keys where needed ? Do you use WITHOUT ROWID tables ?

monfera commented 6 years ago

The render loop was as simple as

window.addEventListener('mousemove', e => {
    db.exec(`
    update desiredWindowSize
      set width = ${e.x - 8},
          height = ${e.y - 8}
    `)

    const rows = db.exec('select domElement, cx, cy, r, fill, opacity from screenSpaced')[0].values
    rows.forEach(render) // 150 rows
}

I can't prepare the update (bind, get are select-specific) but on your suggestion I tried it on the select which has no parameters:

const   query = db.prepare(`select domElement, cx, cy, r, fill, opacity from screenSpaced`)
query.bind({})

window.addEventListener('mousemove', e => {
    db.exec(`update desiredWindowSize set width = ${e.x - 8}, height = ${e.y - 8};`)
    while(query.step()) {
      draw(query.getAsObject())
    }
}

This change resulted in a very nice 2-3x speedup for the version which has all views, making it catch up with the version that materializes into tables (create table xxx(...) as select ...). It's still around 3500ms so I won't get a frame rate out of it, but it was still an interesting experiment. Whether or not I used WITHOUT ROWID doesn't seem to matter. Thanks for the suggestions!

monfera commented 6 years ago

P.S. I used your suggestion and added back the ANALYZE; so the materialized version now also benefits from the speedup (in fact, 3-4x) so now it's "only" 1000ms, still some ways from the native 50ms but the PREPARE, properly run shaved off close to one OOM out of the ~2 OOM initial speed difference. I just thought one caught up with the other as the materialized version didn't currently have the ANALYZE; which was needed to realize the extra speed.

audioscavenger commented 6 years ago

WebSQL is deprecated. Why do you even bother?

monfera commented 6 years ago

I know. Some of my interests were:

while

etc., so in short, learning.

Taytay commented 5 years ago

@monfera : I'm looking for a benchmark, so I am curious if you were able to distill down these queries to something simple to compare? I ask because we are converting this to WASM in #255, which is faster.

monfera commented 5 years ago

@Taytay not really - I hooked up a few tables and a bunch of views atop of the tables and other views, updated some input table on every requestAnimationFrame and in turn, queried the top level view. It wasn't a lot of data but quite a few views in a relatively deep graph, joining small views/tables with one another, selection, aggregation etc. Lmk if I should I send the test page to your mailbox as seen on gh (or elsewhere) or dm me on twitter

Taytay commented 5 years ago

@monfera : This issue is closed, but now that #255 is merged, SQL.js is compiled into WASM, which is faster, but not an Order of magnitude faster. :) Still, I'd be curious what you found in your tests with the new version.

In my testing, the more you can keep in the WASM code, the better. Calling across the boundary into WASM via multiple calls to prepare, bind, and execute are going to be slow. I was recently experimenting with simply batching inserts to see how fast I could get things, and even when I was using prepare and bind, it was slow if I inserted an individual row at once. It was at least 1 OOM faster if I inserted multiple rows with a single SQL statement. So, preventing calls across that boundary is essential.

I don't know if the current version of SQL.js is compiled to know that everything should be "in memory", which might be a faster way to run SQLite. I've got an experimental branch open that removes some features and made things at least twice as fast, but it's a bit out of date now : https://github.com/Taytay/sql.js/pull/1 Still, it shows that there are certainly more optimizations to come.

Also, you said that you can't use prepare because it is "select specific", but I think if you prepared your statement :

    update desiredWindowSize
      set width = ?,
          height = ?

and then ran bind to give it the exact parameters, that would work, right? I wasn't sure I followed.