lana-k / sqliteviz

Instant offline SQL-powered data visualisation in your browser
https://sqliteviz.com
Apache License 2.0
2.11k stars 114 forks source link

[RFE] Custom sql.js build with newer SQLite and a few extensions #45

Closed saaj closed 3 years ago

saaj commented 3 years ago

Use cases

Math functions

As a educational user of Sqliteviz, In order to use SQLite to generate mathematical function series and graphs them, I want to be able to call functions like sin, cos etc.

~SQLite has math functions, but:~

~The math functions shown below are part of the SQLite amalgamation source file but are only active if the amalgamation is compiled using the -DSQLITE_ENABLE_MATH_FUNCTIONS compile-time option.~

Turns out there's another extension mechanism used in sql-js' Makefile to enable the mathematical function (which apparently was used before 3.35.0 2021-03-12 introduced SQLITE_ENABLE_MATH_FUNCTIONS):

EXTENSION_FUNCTIONS = extension-functions.c
EXTENSION_FUNCTIONS_URL = https://www.sqlite.org/contrib/download/extension-functions.c?get=25

The C file contains this description (and these could be useful in the auto-complete.):

This library will provide common mathematical and string functions in
SQL queries using the operating system libraries or provided
definitions.  It includes the following functions:

Math: acos, asin, atan, atn2, atan2, acosh, asinh, atanh, difference,
degrees, radians, cos, sin, tan, cot, cosh, sinh, tanh, coth, exp,
log, log10, power, sign, sqrt, square, ceil, floor, pi.

String: replicate, charindex, leftstr, rightstr, ltrim, rtrim, trim,
replace, reverse, proper, padl, padr, padc, strfilter.

Aggregate: stdev, variance, mode, median, lower_quartile,
upper_quartile.

The string functions ltrim, rtrim, trim, replace are included in
recent versions of SQLite and so by default do not build.

A series can be generated by a recursive CTE without creating a table.

WITH RECURSIVE series(x) AS (
    SELECT 0
  UNION ALL
    SELECT x + 0.01 
    FROM series
    WHERE x + 0.01 <= 2 * pi()
)
SELECT x, sin(x) y
FROM series

There's simpler way to do it, with generate_series table-valued function, but:

The generate_series(START,END,STEP) table-valued function is a loadable extension included in the SQLite source tree [...]

It can be used like like:

SELECT value FROM generate_series(5,100,5)

Or

SELECT random() FROM generate_series LIMIT 20

Generating matrix for Plotly surface plot

As a user of Sqliteviz, In order to visualise 3D surface, I want to be able to select the pivot/matrix table that Plotly 3D surface plot expects.

There's SQLite pivot_vtab virtual table extension (see building and running example in this SO answer).

.load ./pivot_vtab
.header on
.mode column

CREATE TABLE point(x REAL, y REAL, z REAL);
INSERT INTO point VALUES
  (5,3,3.2),
  (5,6,4.3),
  (5,9,5.4),
  (10,3,4),
  (10,6,3.8),
  (10,9,3.6),
  (15,3,4.8),
  (15,6,4),
  (15,9,3.5);

CREATE VIRTUAL TABLE temp.pivot USING pivot_vtab(
  (SELECT y FROM point GROUP BY y),
  (SELECT x, x FROM point GROUP BY x),   
  (SELECT z FROM point WHERE y = ?1 AND x = ?2)
);

-- A SQL hack to "attach" unique X values as a column
SELECT xt.x, p.*
FROM (
  SELECT row_number() OVER () rownum, *
  FROM temp.pivot 
) p
JOIN (
  SELECT row_number() OVER () rownum, x
  FROM point 
  GROUP BY x
) xt USING(rownum);

It produces:

x           rownum      y           5.0         10.0        15.0      
----------  ----------  ----------  ----------  ----------  ----------
5.0         1           3.0         3.2         4.0         4.8       
10.0        2           6.0         4.3         3.8         4.0       
15.0        3           9.0         5.4         3.6         3.5       

WASM build of SQLite

According to this issue https://github.com/sql-js/sql.js/issues/342, loadable extension can also be pre-built. And here's SQLite's Statically Linking A Run-Time Loadable Extension.

twoxfh commented 3 years ago

Not sure what your asking for in this issue, sql.js Math functions are in the 1.5 release that Sqliteviz uses.

Simple test: Select cos(45) as 'cos'

lana-k commented 3 years ago

@twoxfh the thing that it's not enough to have cos() to create y=cos(x) chart. You have to generate a series of points. That is simple with generate_series extension which is not included in sqliteviz version of SQLite. pivot_vtab extension also could be useful in building surface plot in case of x,y,z data format.

twoxfh commented 3 years ago

@saaj @lana-k I committed an updated sql-wasm.wasm in my repo. Feel free to test.

saaj commented 3 years ago

I tried it your branch. It didn't work.

Firefox.

RuntimeError: abort(LinkError: import object field 'a' is not a Function). Build with -s ASSERTIONS=1 for more info.

Chromium.

Uncaught (in promise) Error: LinkError: WebAssembly.instantiate(): Import #0 module="a" function="a" error: function import requires a callable

I plan to take a look at it the custom build when I have time. I have roughly these in mind.

  1. Remove dependency on sql.js.

    https://github.com/lana-k/sqliteviz/blob/2abd42c9c37eaac5a566926c343abaa70bc3eaec/package.json#L22

  2. Point to sql.js, pivot_vtab and other extensions with a URL say to GitHub tag or commit in a config file (TBD)

  3. Integrate in the "build dependency" in package.json with one of the "npm hooks" (install, prepare, preinstall, etc). There are many suggestions in this SO question

  4. Tag build job should not need any changes

twoxfh commented 3 years ago

Interesting, my local build worked, admittedly I did not properly remove the dependency and rebuild sqliteviz. @lana-k might have thoughts on static asset contributions.

twoxfh commented 3 years ago

I updated my build directory, I noticed my buld of SQL-WASM doesnt appear to show the added extensions. I tried the queries and they worked though. Githack link

lana-k commented 3 years ago

@twoxfh Yes, that works. I don't mind static assets but it must be a script that does all the work in case when sql.js is updated (download sql.js, download extensions, change makefile, run npm run rebuild etc.). Also, the dependency "sql.js": "^1.5.0", in package.json should be replaced with "sql.js": "<path_to_local_custom_sql.js_build>", Are you interested in creating a pull request?

twoxfh commented 3 years ago

Unfortunately building sql-js is painful and not really scriptable for future updates due to manually editing js files. I could upload the node_modules directory which would be easy. Also https://github.com/rhashimoto/wa-sqlite potentially might give better flexibility to manage customizations without building the wasm, possibly abort long queries due to it being async, and allows you to program vtab extensions in js.

saaj commented 3 years ago

I briefly looked at wa-sqlite. I see these issues:

  1. Community size/existence (comparing to sql.js)
  2. There was an attempt to use Asyncify in sql.js and it led to significant performance degradation
  3. Not all major browsers support Atomics

That practically means that we should stay with sql.js for the time being.

saaj commented 3 years ago

I made some progress on the issue today. Here's the draft PR #62. Not too bad, and conceptually it's what I expected. But a lot of details are left to do.

twoxfh commented 3 years ago

I haven't tested it yet, but looks nice not to maintain the API.js in sql-js and extensions file.

saaj commented 3 years ago

The PR is merged, and I'm closing this issue.

A note on wa-sqlite. Some features there like JS API for SQLite VFS indeed look interesting. We may want to look at it again (e.g. to see how it can be implemented on top of sql.js or integrated directly in the custom WASM builds), if there's a use case in sqliteviz that would need such a feature.

rhashimoto commented 3 years ago

I briefly looked at wa-sqlite. I see these issues:

  1. Community size/existence (comparing to sql.js)
  2. There was an attempt to use Asyncify in sql.js and it led to significant performance degradation
  3. Not all major browsers support Atomics

(1) is indeed a valid concern. If I didn't write it I'd be reluctant to use it based solely on that.

(2) is not so cut and dry. First, wa-sqlite provides both synchronous and asynchronous builds so if you don't need Asyncify then you don't have to include it. Second, if you do need to use an asynchronous resource (e.g. IndexedDB), the Asyncify penalties can be a small fraction of the cost of accessing the resource. This is the case for IndexedDB, for example - the time spent waiting for IDB events is typically far greater than the Asyncify overhead. I think your decision will usually be driven by the performance of your resource and not wa-sqlite itself.

(3) I really question. wa-sqlite does not require the use of Atomics. You can use Atomics to avoid the Asyncify build if you're willing to jump through a lot of extra hoops (and give up browser compatibility), but it wouldn't surprise me if that turned out to be slower in the end.

I would add some other potential issues for you to weigh:

  1. Migration costs. wa-sqlite is not a fork of sql.js and has a different API.

  2. Not a turn-key solution. The main reason to use wa-sqlite is its extensibility and you have to write the extensions you want yourself. wa-sqlite makes this easier but easier is not always the same thing as easy. For example, writing an IndexedDB VFS that works is pretty simple but writing one optimized for performance is another story and you can end up going further down the rabbit hole with access patterns and journaling than you really intended.

  3. The default license is currently GPLv3 (alternative licensing by explicit arrangement).

saaj commented 3 years ago

First of all, thanks for detailed explanation.

On Asyncify and/or Atomics. To be honest, I've only cursory understanding of the two, so I may be missing something. But still, currently there's only one new feature that might benefit from asynchronous interface to SQLite (as mentioned here https://github.com/lana-k/sqliteviz/issues/45#issuecomment-864490799). It's cancelling of running queries -- https://github.com/lana-k/sqliteviz/issues/33. It's not that important of a feature (the database is in memory in the end and the memory is limited to what the browser allows a tab to allocate), but a nice to have one. Options there are:

  1. just terminate the web worker and let the user reload the database. Price: IxD.
  2. spawn a web worker with database per query, and then terminate. Price: x2 memory overhead.
  3. spawn a web worker per query, but use the same SharedArrayBuffer. Price: TBD (may not be feasible)
  4. SQLite build with Asyncify. Price: worse performance, bigger WASM build.
  5. SQLite build with Atomics (which doesn't exist). Price: TBD but at least browser compatibility.

My take is that (1) is good enough, and (3) is worth exploring. And as I mentioned in https://github.com/lana-k/sqliteviz/issues/45#issuecomment-873448608, future will show if there are more use cases that might benefit from it.

rhashimoto commented 3 years ago

For just aborting, I think there's another possibility. Since you're already compiling your own SQL.js artifacts, you can patch the SQLite source to call out to Javascript within query processing and conditionally return an error code. I mentioned this at the bottom of a reply to @twoxfh.

If you want to avoid an Asyncify build, the Javascript call would have to be synchronous, which really restricts how you can signal the abort. But LocalStorage access is synchronous and state is shared by all Window contexts in the same origin so that would let you set a flag in one context to abort a running query in another context. So the only remaining hiccup with that is you would have to run SQLite in a Window context, like an invisible iframe, instead of a Worker since workers can't access LocalStorage.

Advantages:

Disadvantages: