randyl / extension-crash-sqlite

0 stars 0 forks source link

Discussion: aw, snap when importing 2M records #1

Open sgbeal opened 1 year ago

sgbeal commented 1 year ago

Re-posted from an email thread, per request by Thomas Steiner, intended for use as a public discussion ground for the problem...

Good evening, folks,

Downloaded and running on Chrome v117 on Linux...

Doing such a large retrieve of a data set that way (via the worker/promiser API) is always going to be painful, as it has to collect the whole data set before responding and return the whole thing via a single massive postMessage().

My first recommendation would be to change:

    let response = await promiser('exec', {
        sql: 'SELECT * FROM t',
        resultRows: [],
        rowMode: 'object'
    });

to

    let response = await promiser('exec', {
        sql: 'SELECT * FROM t',
        callback: function(...){...}
        rowMode: 'object'
    });

where callback is a function which receives an event for each row as it's fetched. An example callback just tested with this extension:

    const resultRows = [];
    let response = await promiser('exec', {
        sql: 'SELECT * FROM t',
        callback: function(ev){
          if(undefined!==ev.row){
            if( 0 === ev.rowNumber % 1000 ) { console.log(ev.rowNumber); }
            resultRows.push(ev.row);
          }
        },
        rowMode: 'object'
    });
    console.log(`retrieve ${retrieveCount}: got ${resultRows.length} rows`);

The event object has 3 relevant properties:

That's not going to help much, if any, with the run time of a 2M-row result set, but it will give the developer some insight into where the run-time is going. Just eyeballing the result row counts in the console, it appears to be fetching 30k-40k rows/second on this modest computer.

The above variant crashed quickly during the 2nd half of the 2nd retrieve in 4 consecutive test runs. Unfortunately, though, there's literally zero information to go on beyond "aw, snap!" :(. The error reported on the Snap page, as in the screenshot of the source repo, is SIGILL (Illegal Instruction), which is not something we can directly trigger from JS or WASM. The implication of that seems to be that it's crashing in native browser-side code. There's no trace of direct JS or WASM involvement in the crash, though certainly at least one of them is playing a role here.

Ah, here we go: on the 5th test run, in the 2nd half of the 2nd retrieve, my Dev Tools unexpectedly paused with:

image

triggered by an array.splice() call, which suggests a native-side OOM as opposed to a WASM-side one. Tapping "continue" in the dev tools after poking around it for several minutes, it continued on its way (presumably some GC cleaned up while it was paused) and made it to the end of the 2M rows. On the next attempt it predictably died horribly in the 2nd half of the 2M result rows. (It doesn't always die at the same point: sometimes 1.1M, sometimes 1.7M.)

Beyond those meager details, there's unfortunately not much to go on regarding what the exact crash is and where it lies.

tomayac commented 1 year ago

FYI: @morsssss @patrickkettner @drhsqlite

sgbeal commented 1 year ago

@randyl something you may want to try which might work around the issue...

Since you're running in an extension and have no concurrency concerns, you can hypothetically make use of our new VFS:

https://sqlite.org/wasm/doc/trunk/persistence.md#vfs-opfs-sahpool

That one is blazing fast compared to the original (3x faster or more), but it has absolutely zero concurrency support. It's in the current sqlite trunk and will be part of the pending 3.43 release. It cannot be "remote-controlled" via the Worker1/Promiser interfaces, however - it has to be loaded in the client's worker thread and operated from there.

The crashes we've seen so far are happening in native code, with the OPFS VFS being the trigger. Perhaps (but only perhaps) the newer OPFS VFS will not trigger it.

randyl commented 1 year ago

Thanks, @sgbeal for taking a look at this.

FWIW, I found this StackOverflow post (which I know is 7 years old), but it describes getting out of memory errors while trying to postMessage() a large amount of data from a worker to the main thread. That seems to support what you're saying, that it's a browser-side issue, not a SQLite/WASM issue.

The scenario that started this whole topic is that I want to get data out of SQLite and download it as a text file so users can have a human-readable backup of their data. That's why I'm retrieving so many rows to begin with.

Inspired by a suggestion in the above SO link, I tried restructuring the code in this extension to create a separate worker, using the SQLite OO1 API and the exec() method with a callback function, converting the results to a Blob, and postMessage()-ing a Blob URL to the main thread. In the end, I got the same "aw, snap" crash on the 2nd retrieve.

Some options that I see are:

If I decide to stick with the current OPFS VFS, is it possible to upgrade to the opfs-sahpool in the future?

sgbeal commented 1 year ago

If I decide to stick with the current OPFS VFS, is it possible to upgrade to the opfs-sahpool in the future?

It's not an upgrade but a side-grade. They are two completely different, and incompatible, implementations which have their own tradeoffs. They can both be used in a single app but cannot operate on the same databases. The new VFS might be better for browser extensions because they're singletons, so have no need for concurrency. The primary tradeoff in the original VFS is some degree of concurrency in exchange for speed.

tomayac commented 1 year ago

Since the repo contains a nice reproduction case, no harm done filling a new.crbug.com, especially now that it looks like it could be a browser issue. If you file one, please put the bug URL here. Thanks!

randyl commented 1 year ago

https://bugs.chromium.org/p/chromium/issues/detail?id=1474186

tomayac commented 1 year ago

Thanks for filing the bug. I could reproduce this and have added a crash ID for macOS. If you could add another for Linux, this would be great.