oven-sh / bun

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
https://bun.sh
Other
74.44k stars 2.78k forks source link

SQLite schema cache is not properly invalidated #1332

Open losfair opened 2 years ago

losfair commented 2 years ago

Version

0.2.0

Platform

Linux devbox-aws 5.11.0-1022-aws #23~20.04.1-Ubuntu SMP Mon Nov 15 14:03:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

What steps will reproduce the bug?

Run the script:

import { Database } from "bun:sqlite";

const db = new Database("mydb.sqlite");
db.run("pragma journal_mode = wal"); // otherwise concurrent writes fail with locked database
db.run(
  "CREATE TABLE IF NOT EXISTS foo (id INTEGER PRIMARY KEY AUTOINCREMENT, greeting TEXT)"
);
db.run("INSERT INTO foo (greeting) VALUES (?)", "Welcome to bun!");

while(true) {
  const output = db.query("SELECT * FROM foo").get();
  console.log(output);
  await new Promise((resolve) => setTimeout(resolve, 1000));
}

In another terminal:

$ sqlite3 ./mydb.sqlite
sqlite> alter table foo rename column greeting to greeting2;

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior?

After the concurrent alter table statement, the output from Bun becomes:

{ id: 1, greeting2: "Welcome to bun!" }

What do you see instead?

Bun's output is still:

{ id: 1, greeting: "Welcome to bun!" } 

Additional information

The previous issue on this (https://github.com/oven-sh/bun/issues/921) was closed without a correct fix. https://github.com/oven-sh/bun/pull/1056 only does schema cache invalidation for changes within the same process, but SQLite is a database sharable across processes. Especially considering all those recent distributed and replicated SQLite additions (Litestream, LiteFS, mvSQLite) this is a correctness issue with pretty high impact.

losfair commented 2 years ago

If you really need to cache column names for whatever reason (performance?), maybe something like this can be done:

BEGIN;
PRAGMA schema_version; -- compare the returned value with the cached version
SELECT some_user_query();
COMMIT;

But at this point I'm not sure whether it would be better than not caching anything...

Jarred-Sumner commented 2 years ago

Is there a way to create a temporary trigger that runs a callback when the schema changes? I think that would fix this

SQLite column caching is a significant perf boost and I'd like to find a way to fix this without a performance regression

losfair commented 2 years ago

Triggers do not cross the process boundary.

SQLite internally has knowledge of the schema version, but that doesn't seem to be available to outside like column names are. A new SQLite API like this should work:

sqlite3_uint64 sqlite3_schema_version(sqlite3_stmt*);

This sounds like a feature request for SQLite!

Jarred-Sumner commented 2 years ago

I think it'd be reasonable to add a flag to the Database constructor that checks the schema version before each statement runs, intended for usecases where you change the schema in another process despite the application already running. This flag would be disabled by default. If you're doing distributed/replicated SQLite, you're probably using a library or at least going to read some instructions about how to do that.