superfly / litefs

FUSE-based file system for replicating SQLite databases across a cluster of machines
Apache License 2.0
3.96k stars 95 forks source link

On new deploy: ERROR: cannot open store: open databases: open database("sqlite.db"): init from database header: invalid sqlite database file: 00000000000000000000000000000000 <> 53514c69746520666f726d6174203300 #172

Closed kentcdodds closed 1 year ago

kentcdodds commented 1 year ago

Here are the logs starting at the deploy step of an existing app (kcd-staging):

2022-11-11T23:33:35Z runner[74f65ad5] den [info]Shutting down virtual machine
2022-11-11T23:33:35Z app[74f65ad5] den [info]Sending signal SIGINT to main child process w/ PID 530
2022-11-11T23:33:35Z app[74f65ad5] den [info]sending signal to exec process
2022-11-11T23:33:35Z app[74f65ad5] den [info]waiting for exec process to close
2022-11-11T23:33:35Z app[74f65ad5] den [info]signal received, litefs shutting down
2022-11-11T23:33:35Z app[74f65ad5] den [info]stream disconnected
2022-11-11T23:33:36Z app[0b51abd0] maa [info]HEAD / 200 89542 - 30.411 ms
2022-11-11T23:33:36Z app[0b51abd0] maa [info]GET /healthcheck 200 - - 40.536 ms
2022-11-11T23:33:37Z app[0b51abd0] maa [info]existing primary found (74f65ad5), connecting as replica
2022-11-11T23:33:37Z app[74f65ad5] den [info]http: error: context canceled
2022-11-11T23:33:38Z app[0b51abd0] maa [info]replica disconnected, retrying: connect to primary: invalid response: code=503 ('http://74f65ad5.vm.kcd-staging.internal:20202')
2022-11-11T23:33:39Z app[0b51abd0] maa [info]existing primary found (74f65ad5), connecting as replica
2022-11-11T23:33:39Z app[74f65ad5] den [info]http: error: context canceled
2022-11-11T23:33:39Z app[0b51abd0] maa [info]replica disconnected, retrying: connect to primary: invalid response: code=503 ('http://74f65ad5.vm.kcd-staging.internal:20202')
2022-11-11T23:33:40Z app[0b51abd0] maa [info]existing primary found (74f65ad5), connecting as replica
2022-11-11T23:33:45Z runner[52b4d8ff] den [info]Starting instance
2022-11-11T23:33:45Z runner[52b4d8ff] den [info]Configuring virtual machine
2022-11-11T23:33:45Z runner[52b4d8ff] den [info]Pulling container image
2022-11-11T23:33:46Z app[0b51abd0] maa [info]HEAD / 200 90416 - 26.666 ms
2022-11-11T23:33:46Z app[0b51abd0] maa [info]GET /healthcheck 200 - - 41.007 ms
2022-11-11T23:33:47Z runner[52b4d8ff] den [info]Unpacking image
2022-11-11T23:33:50Z runner[52b4d8ff] den [info]Preparing kernel init
2022-11-11T23:33:50Z runner[52b4d8ff] den [info]Setting up volume 'data'
2022-11-11T23:33:50Z runner[52b4d8ff] den [info]Opening encrypted volume
2022-11-11T23:33:50Z runner[52b4d8ff] den [info]Configuring firecracker
2022-11-11T23:33:50Z runner[52b4d8ff] den [info]Starting virtual machine
2022-11-11T23:33:50Z app[52b4d8ff] den [info]Starting init (commit: 81d5330)...
2022-11-11T23:33:50Z app[52b4d8ff] den [info]Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2022-11-11T23:33:50Z app[52b4d8ff] den [info]Preparing to run: `docker-entrypoint.sh litefs -- node ./other/start.js` as root
2022-11-11T23:33:51Z app[52b4d8ff] den [info]2022/11/11 23:33:51 listening on [fdaa:0:23df:a7b:d826:2:daf6:2]:22 (DNS: [fdaa::3]:53)
2022-11-11T23:33:51Z app[52b4d8ff] den [info]config file read from /etc/litefs.yml
2022-11-11T23:33:51Z app[52b4d8ff] den [info]LiteFS v0.3.0-beta3, commit=fabf62dacbc590ed94d79493a3691deb19b153bd
2022-11-11T23:33:51Z app[52b4d8ff] den [info]Using Consul to determine primary
2022-11-11T23:33:53Z app[52b4d8ff] den [info]initializing consul: key= url=https://:63f0961e-9759-5562-35b1-6861c043f37d@consul-na.fly-shared.net/kcd-staging-yexkqw5x266qm38d/ hostname=52b4d8ff advertise-url=http://52b4d8ff.vm.kcd-staging.internal:20202
2022-11-11T23:33:53Z app[52b4d8ff] den [info]ERROR: cannot open store: open databases: open database("sqlite.db"): verify database file: cannot read database header: invalid sqlite database file: 00000000000000000000000000000000 <> 53514c69746520666f726d6174203300
2022-11-11T23:33:54Z app[52b4d8ff] den [info]Starting clean up.
2022-11-11T23:33:54Z app[52b4d8ff] den [info]Umounting /dev/vdc from /data
2022-11-11T23:33:56Z app[0b51abd0] maa [info]HEAD / 200 89067 - 39.427 ms
2022-11-11T23:33:56Z app[0b51abd0] maa [info]GET /healthcheck 200 - - 45.601 ms
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Starting instance
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Configuring virtual machine
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Pulling container image
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Unpacking image
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Preparing kernel init
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Setting up volume 'data'
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Opening encrypted volume
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Configuring firecracker
2022-11-11T23:34:00Z runner[52b4d8ff] den [info]Starting virtual machine
2022-11-11T23:34:01Z app[52b4d8ff] den [info]Starting init (commit: 81d5330)...
2022-11-11T23:34:01Z app[52b4d8ff] den [info]Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2022-11-11T23:34:01Z app[52b4d8ff] den [info]Preparing to run: `docker-entrypoint.sh litefs -- node ./other/start.js` as root
2022-11-11T23:34:01Z app[52b4d8ff] den [info]2022/11/11 23:34:01 listening on [fdaa:0:23df:a7b:d826:2:daf6:2]:22 (DNS: [fdaa::3]:53)
2022-11-11T23:34:01Z app[52b4d8ff] den [info]config file read from /etc/litefs.yml
2022-11-11T23:34:01Z app[52b4d8ff] den [info]LiteFS v0.3.0-beta3, commit=fabf62dacbc590ed94d79493a3691deb19b153bd
2022-11-11T23:34:01Z app[52b4d8ff] den [info]Using Consul to determine primary
2022-11-11T23:34:02Z app[52b4d8ff] den [info]initializing consul: key= url=https://:63f0961e-9759-5562-35b1-6861c043f37d@consul-na.fly-shared.net/kcd-staging-yexkqw5x266qm38d/ hostname=52b4d8ff advertise-url=http://52b4d8ff.vm.kcd-staging.internal:20202
2022-11-11T23:34:02Z app[52b4d8ff] den [info]ERROR: cannot open store: open databases: open database("sqlite.db"): init from database header: invalid sqlite database file: 00000000000000000000000000000000 <> 53514c69746520666f726d6174203300
2022-11-11T23:34:03Z app[52b4d8ff] den [info]Starting clean up.
2022-11-11T23:34:03Z app[52b4d8ff] den [info]Umounting /dev/vdc from /data
2022-11-11T23:34:06Z app[0b51abd0] maa [info]HEAD / 200 88884 - 30.399 ms
2022-11-11T23:34:06Z app[0b51abd0] maa [info]GET /healthcheck 200 - - 39.363 ms
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Starting instance
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Configuring virtual machine
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Pulling container image
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Unpacking image
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Preparing kernel init
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Setting up volume 'data'
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Opening encrypted volume
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Configuring firecracker
2022-11-11T23:34:10Z runner[52b4d8ff] den [info]Starting virtual machine
2022-11-11T23:34:11Z app[52b4d8ff] den [info]Starting init (commit: 81d5330)...
2022-11-11T23:34:11Z app[52b4d8ff] den [info]Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2022-11-11T23:34:11Z app[52b4d8ff] den [info]Preparing to run: `docker-entrypoint.sh litefs -- node ./other/start.js` as root
2022-11-11T23:34:11Z app[52b4d8ff] den [info]2022/11/11 23:34:11 listening on [fdaa:0:23df:a7b:d826:2:daf6:2]:22 (DNS: [fdaa::3]:53)
2022-11-11T23:34:11Z app[52b4d8ff] den [info]config file read from /etc/litefs.yml
2022-11-11T23:34:11Z app[52b4d8ff] den [info]LiteFS v0.3.0-beta3, commit=fabf62dacbc590ed94d79493a3691deb19b153bd
2022-11-11T23:34:11Z app[52b4d8ff] den [info]Using Consul to determine primary
2022-11-11T23:34:12Z app[52b4d8ff] den [info]initializing consul: key= url=https://:63f0961e-9759-5562-35b1-6861c043f37d@consul-na.fly-shared.net/kcd-staging-yexkqw5x266qm38d/ hostname=52b4d8ff advertise-url=http://52b4d8ff.vm.kcd-staging.internal:20202
2022-11-11T23:34:12Z app[52b4d8ff] den [info]ERROR: cannot open store: open databases: open database("sqlite.db"): init from database header: invalid sqlite database file: 00000000000000000000000000000000 <> 53514c69746520666f726d6174203300
2022-11-11T23:34:13Z app[52b4d8ff] den [info]Umouting clean up.
2022-11-11T23:34:13Z app[52b4d8ff] den [info]Umounting /dev/vdc from /data
2022-11-11T23:34:16Z app[0b51abd0] maa [info]HEAD / 200 91113 - 34.210 ms
2022-11-11T23:34:16Z app[0b51abd0] maa [info]GET /healthcheck 200 - - 39.487 ms
2022-11-11T23:34:26Z app[0b51abd0] maa [info]HEAD / 200 90249 - 30.134 ms
2022-11-11T23:34:26Z app[0b51abd0] maa [info]GET /healthcheck 200 - - 38.106 ms
2022-11-11T23:34:36Z app[0b51abd0] maa [info]HEAD / 200 88381 - 35.107 ms
2022-11-11T23:34:36Z app[0b51abd0] maa [info]GET /healthcheck 200 - - 44.929 ms
2022-11-11T23:34:46Z app[0b51abd0] maa [info]HEAD / 200 90021 - 148.030 ms
2022-11-11T23:34:46Z app[0b51abd0] maa [info]GET /healthcheck 200 - - 178.958 ms

I have no idea what this error means :(

Here's the broken build: https://github.com/kentcdodds/kentcdodds.com/actions/runs/3448608277/jobs/5755798070

Happened at around 4:50ish of this live stream if that's helpful: https://www.youtube.com/watch?v=AA-1sweDAtA

benbjohnson commented 1 year ago

Thanks for the bug report. The error is reporting that it's trying to read the database header magic but that it's all set to zeros. All SQLite databases start with the characters "SQLite format 3\x00". My guess is that a replica started copying from the primary but didn't finish and has a partial database file and gets stuck on startup. Sorta similar to https://github.com/superfly/litefs/issues/173. I'll add a check on startup to ignore and re-snapshot from the primary.