tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.4k stars 246 forks source link

ERROR libsql::replication: replicator sync | database disk image is malformed #1607

Open Aft1n opened 1 month ago

Aft1n commented 1 month ago

I have an app deployed in production that has a replica db saved in the app folder, and today i checked the logs and saw this error:

ERROR libsql::replication: replicator sync error: replication error: Injector error: SQLite error: database disk image is malformed

Everything was working perfectly fine for couple of weeks, no changes to db schema were made, it just simply started to show this error in logs

Aft1n commented 1 month ago

Somehow my replica db got corrupted, after resyncing replica db locally and pushing it back to the server it got fixed.. What could have caused the problem?? Multiple user writes or something else?

haaawk commented 1 month ago

Sometimes embedded replicas get corrupted when their db file is opened with regular sqlite3 driver. Maybe you did something like that?

@LucioFranco Would you be able to help please?

Aft1n commented 1 month ago

Considering this file was deployed in production, i didn't touch it. Though i had some active traffic to the website before noticing this error, though it was mostly reads, and at first i thought that it was some kind of attack, like sql injection( im not very proficient in those issues)

But could multiple simultaneous writes and potential error corrupt it??

haaawk commented 1 month ago

It is not possible to write directly to embedded replicas. The writes are being forwarded to a primary in the cloud and then fetched back with sync. So multiple writers should be fine I think

Aft1n commented 1 month ago

Yeah, i know that. But if there was some race condition or some errors during writes, maybe this could potentially on sync corrupt replica dbs. Interesting thing was, that i have a full-stack app that has replica. And also a separate api that also has its own replica. And even though some actions were happening on the full-stack app, inside my api application DB was corrupted as well..

So basically there was something going on in turso parent db, and this error was synced to both replicas. But it worked totally fine with turso dashboard, and after deleting replicas and syncing again it got fixed.

haaawk commented 1 month ago

@LucioFranco I think you should investigate that as part of your embedded replicas work

LucioFranco commented 1 month ago

@Aft1n could you share the version of libsql that you were using?

Aft1n commented 1 month ago

i was using the latest - 7.0

LucioFranco commented 1 month ago

I think your theory sounds correct, I would say if you start to see this issue again you can ping me on discord and I can take a look at what is going on it. This is slightly hard to debug since the malformed error is quite cryptic.

Aft1n commented 1 month ago

So this error happened again, heres an abrupt error from logs:

22:56:11 1|main | error: replication error: Injector error: SQLite error: database disk image is malformed 22:56:11 1|main | at new Database (/app/node_modules/libsql/index.js:75:17) 22:56:11 1|main | at _createClient (/app/node_modules/@libsql/client/lib-esm/sqlite3.js:39:16) 22:56:11 1|main | at /app/src/db/index.ts:7:28 22:56:11 1|main | Bun v1.1.9 (Linux x64)

22:54:47 1|main | 2024-08-02T22:54:47.369381Z ERROR libsql::replication: replicator sync error: replication error: Injector error: SQLite error: database disk image is malformed 22:54:47 0|main | 2024-08-02T22:54:47.376994Z ERROR libsql::replication: replicator sync error: replication error: Injector error: SQLite error: database disk image is malformed

And i have noticed that my Turso sync usage skyrocketed, because it was having an error and couldnt go through, possibly its currently 5gb/2gb

Aft1n commented 1 month ago

As well as found this one in my fullstack app, that basically is the source for all changes to be sent to turso from.. And this a cron task that failed i assume, or something happend to close the connection. Maybe this could affect it?

error logs