sql-js / sql.js

A javascript library to run SQLite on the web.
http://sql.js.org
Other
12.78k stars 1.06k forks source link

async operations #525

Open jimmywarting opened 2 years ago

jimmywarting commented 2 years ago

I know you can create a sql instance of an uint8array and make operations on that. but i wonder: is it possible to just load a simple File instead of a Uint8Array?

i would like to load a read-only database and being able to do:

// load file from <input type="file">
const file = input.files[0]

// Create a database from a file without reading everything into memory
const db = new SQL.Database(file)

// Prepare an sql statement
const stmt = db.prepare("SELECT * FROM sqlite_master")

// Bind values to the parameters and fetch the results of the query
const result = await stmt.getAsObject()
// note: using await

The idea would just be for sql.js to slice the part of the file that is needed without loading the hole thing into memory and kind of doing: await file.slice(start, end).arrayBuffer() operations instead

jimmywarting commented 2 years ago

it would technically also be possible to have some kind of write operations too without loading everything into memory also if you did something like:

const pre = this._file.slice(start1, end1)
const middle = new Blob([ data ])
const end = this._file.slice(start2, end2)

this._file = new Blob([ pre, mid, end ])

creating new blobs like this with .slice() dose not create any more memory, it only creates a new handle \w a offset and a size property

image taken from: https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit

just b/c you concatenate two of the same blob chunks new Blob([a, a]) dose not mean you will create 2x as much memory, it will only mean: read the same chunk twice

twoxfh commented 2 years ago

Anything is possible but I think your looking for a file api vs a sqlite api such as https://github.com/mailru/FileAPI. Using NDJSON or CSV format may also help.

Edit: not sure why this was thumbs down, maybe need to clarify the usecase? The architectural concern might be wasm.. you copy data to the wasm fs and then perform operations and the copy out. There are various projects that use indexdb such as absurd-sql, which may offload the memory once loaded, wa-sqlite let's you write your own vfs, but still requires data and wasm tango. If this is all server side a wasi based sqlite might be better, but your usecase needs to be clearer and likely beyond the scope of this project especially since it's not async. As mentioned in other issues, until pthreads are supported, adopted, and used with wasm, async really isn't async

I have loaded around 300 mb files using sql.js over the emscripten filesystem and then loaded them into a db in sub 10 seconds but that required multiple copies in mem and was fairly straight forward data.