signalapp / Signal-Desktop

A private messenger for Windows, macOS, and Linux.
https://signal.org/download
GNU Affero General Public License v3.0
14.17k stars 2.58k forks source link

SQLITE_CORRUPT: database disk image is malformed #3180

Open agret opened 5 years ago

agret commented 5 years ago

Bug Description

Unhandled Promise Rejection Error: SQLITE_CORRUPT: database disk image is malformed

People had the same issue in #2972 but it was prematurely closed before resolution a few days ago so I can't post my log file in there. Closed with reason "I'm going to close this, since that new dialog prevents you from being stuck in that 'Optimizing...' stage." -- not sure what this is referring to as the dialog just states the database is corrupt then the app closes, there is no resolution offered.

Steps to Reproduce

Using MacOS and Signal was working fine. It asked me to restart the app for the new version to install and then gives me a corruption error.

Platform Info

Signal Version:

1.22.0

Operating System:

MacOS Mojave 10.14.2

Link to Debug Log

{"name":"log","hostname":"MacBook-Pro.local","pid":10259,"level":30,"msg":"app ready","time":"2019-02-27T00:00:19.774Z","v":0}
{"name":"log","hostname":"MacBook-Pro.local","pid":10259,"level":30,"msg":"updateSchema: Current schema version: 9; Most recent schema version: 11; SQLite version: 3.20.1; SQLCipher version: 3.4.2;","time":"2019-02-27T00:00:19.789Z","v":0}
{"name":"log","hostname":"MacBook-Pro.local","pid":10259,"level":30,"msg":"updateToSchemaVersion10: starting...","time":"2019-02-27T00:00:19.790Z","v":0}
{"name":"log","hostname":"MacBook-Pro.local","pid":10259,"level":50,"msg":"Unhandled Promise Rejection: Error: SQLITE_CORRUPT: database disk image is malformed","time":"2019-02-27T00:00:19.798Z","v":0}
{"name":"log","hostname":"MacBook-Pro.local","pid":10266,"level":30,"msg":"app ready","time":"2019-02-27T00:00:52.195Z","v":0}
{"name":"log","hostname":"MacBook-Pro.local","pid":10266,"level":30,"msg":"updateSchema: Current schema version: 9; Most recent schema version: 11; SQLite version: 3.20.1; SQLCipher version: 3.4.2;","time":"2019-02-27T00:00:52.213Z","v":0}
{"name":"log","hostname":"MacBook-Pro.local","pid":10266,"level":30,"msg":"updateToSchemaVersion10: starting...","time":"2019-02-27T00:00:52.214Z","v":0}
{"name":"log","hostname":"MacBook-Pro.local","pid":10266,"level":50,"msg":"Unhandled Promise Rejection: Error: SQLITE_CORRUPT: database disk image is malformed","time":"2019-02-27T00:00:52.221Z","v":0}
scottnonnenberg-signal commented 5 years ago

Ah, so in your case you just get the dialog with 'Copy Error and Quit' and 'Quit'? Yep, we could improve the database-specific error handler, so you get the option to delete everything and restart in this case as well.

There should be more logs available from the time when you ran into these errors - can you look for those? We're still looking for clues as to why database corruption happens.

scottnonnenberg-signal commented 5 years ago

Also, if you'd like this bug to track the ultimate resolution of SQLite corruption errors in Signal Desktop, fine by me.

To all future posters: we need as much information as we can get to help us track down what is causing the corruption. Did you restore from backup? Did you need to Force Close SIgnal Desktop, or turn off your computer without shutting it down normally? What exactly was your computer doing at the last time Signal Desktop was working properly?

Beyond that, we really, really need debug logs from right before corruption appeared. You can go into your Signal config directory (locations for each OS are listed here: https://github.com/signalapp/Signal-Desktop/blob/development/CONTRIBUTING.md#the-staging-environment) and zip up the logs directory and post it here or send it to support@signal.org.

agret commented 5 years ago

I'll have a look for the logs when I'm back at work with my Mac. The app was working normally when I opened it, showed all my contacts and conversations. It said there is an update available so I let it download the update and after the update completed I launched the app again only to get the database is corrupt message.

To answer your question yes that is the dialogue I saw when opening it before deleting my local database.

wolever commented 5 years ago

I've also experienced this issue. Using mv "~/Library/Application Support/Signal"{,.old} to move the old config directory out of the way then restarting Signal does correct the issue.

I've emailed my logs to support@signal.org, and I'm happy to help if there's anything else I can do.

wolever commented 5 years ago

Assuming the database is at $config/sql/db.sqlite, what format is the database stored in? It doesn't seem like sqlite can read it, even with a correctly working Signal:

$ file …/Signal/sql/db.sqlite
…/Signal/sql/db.sqlite: data
$ sqlite3 …/Signal/sql/db.sqlite .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
/**** ERROR: (26) file is not a database *****/
ROLLBACK; -- due to errors
sunflowerspeed commented 5 years ago

Hi, I am also experiencing this issue. Interestingly I have no folder at ~/Library/Application Support/Signal. It doesn't show up after installing Signal, or after telling the desktop app to clear all data, or after uninstalling and reinstalling. Furthermore, I tried creating a folder called Signal within Application Support and then doing what Wolever suggested, but the Terminal insisted there was "no such file or directory." As such, I don't seem to even have any debug logs that I can share.

I also tried deleting a random message from a conversation I had with someone on my phone, and then I re-linked Signal Desktop (after clearing all data from the Desktop app) to my phone. Interestingly, the deleted message still showed up in the thread on the Desktop app. This suggests to me that the messages are being stored somewhere on my computer that I can't find.

I also tried installing version 1.20.0 on my Mac again (which as I recall worked fine), but I ran into the same issues. Seems like something deeper in my system has gone wrong and I'm not sure where to look. If there is anything I can do to help, I would be happy to do so.

scottnonnenberg-signal commented 5 years ago

@wolever It's a SQLCipher database, so you can't use plain sqlite to open it. Thanks for sending your logs. Anything else you can say about the state of your computer around the time of discovering the corruption (or the last usage, when corruption might have happened), would be really useful. Did you have to do a hard restart of your computer? Did you restore from backup? Was there a Signal Desktop update?

@Arcuna First, let's talk about how you initially installed Signal Desktop. Was it from https://signal.org, or did you install the beta? Now, when you install a new version, what exactly do you see when you start it up? Just a dialog? Or does the window pop up?

funnypaperboy commented 5 years ago

HI, like Arcuna, a folder for Signal also does not appear in my application support. From what I can remember the errors started happening when I updated Signal to 1.22.0. At first, it would randomly crash with some pop-up update error appearing (can't remember the exact name of error) and I would have to reinstall. After reinstalling it would work for a while, then I started receiving error handling messages to one of my contacts and it gradually spread to all of them.

scottnonnenberg-signal commented 5 years ago

@funnypaperboy What version of macOS are you on? Can you show me how you get to the Application Support directory? Also, please open a new bug with full details (including a Debug Log) for your 'error handling message' issue.

funnypaperboy commented 5 years ago

I'm on macOS Sierra 10.12.2. I went to Macintosh HD -> Library -> Application Support. How do I open a new bug? Didn't I open one with all the details before and you closed it?

sunflowerspeed commented 5 years ago

@scottnonnenberg I downloaded and installed Signal Desktop via signal.org. To my knowledge I have never installed a beta version. I just now re-downloaded and reinstalled it, and upon opening it I was greeted with a blue screen and the text "Loading messages. X so far..." This number X increased for a while until the app opened. Interestingly some recent messages still came through to the Desktop client, but they were all either images or very short messages (a message with the string :) made it through, and messages containing only emoji also seem to get through, but nothing with more text got through). Any messages containing longer strings that I've sent or received after February 12 don't appear to have reached the Desktop client. I've attached a few screenshots; I hope that helps. The messages indicating that I marked the contact as verified show up every time I clear all data on Desktop and then re-link it to my phone. I have indeed verified the contact in question but only once, and long ago.

To navigate to the folder I use Finder (i.e. the GUI rather than via command line). I am running macOS 10.14 and Signal Desktop 1.22.0.

For Signal1 Signal2 Signal3 Signal4

sunflowerspeed commented 5 years ago

Sorry to double-post, but I just found something else interesting: Desktop still displays a copy of all messages I send and receive via my phone, so long as they contain no alphanumeric text. That means special characters and emoji work, but as soon as there's a letter in it, Desktop doesn't show it. HOWEVER, if I send or receive a message via my phone, that most recent message still shows up in the preview on the left-hand side of the screen on Desktop, regardless of the content. As soon as I click on that person's tab, the preview disappears and is replaced by the most recent non-alphanumeric message. (The preview also appears for a couple seconds even if I am already viewing the conversation in question, but then it vanishes shortly thereafter.) Finally, Desktop seems to think that any messages I send from my phone do not get received by the recipient; that is, it shows only one checkmark by the sent message, even though I can confirm (and my phone shows) that the recipient read the message on their phone. Attempting to send an emoji message from Desktop immediately gives me a Send Failed error. I have attached more screenshots. (In case this is relevant, the Activity Monitor consistently shows about 107-109% CPU usage for Signal, as well as 15-25% CPU usage for Signal Helper, so long as Signal is running. Opening and closing the app works normally; I do not have to Force Quit.)

I am sorry for my quick writing style. I hope this helps!

Desktop Phone1 Phone2

scottnonnenberg-signal commented 5 years ago

@Arcuna Please reach out to support@signal.org. Your installation is displaying quite a few unusual behaviors, and we definitely need to find your data directory.

sunflowerspeed commented 5 years ago

Thanks, I got in touch with support. In the meantime, I feel like an idiot (@funnypaperboy may be interested to know this) and I successfully found the hidden Library folder located at Users/[My Name]/Library. (Navigating to my user folder and pressing Cmd + Shift + . revealed the hidden folders, for the luddites out there like me who may not know.) I can now confirm that Wolever's method works; removing the Signal folder solves the problem (albeit at the cost of losing all your messages on Desktop). I kept a copy of the original (problematic) Signal folder which I can easily put back where I found it, and I am happy to continue helping with debugging in any way I can.

funnypaperboy commented 5 years ago

@Arcuna you're a bloody genius mate, the problems fixed!

wolever commented 5 years ago

@wolever It's a SQLCipher database, so you can't use plain sqlite to open it. Thanks for sending your logs. Anything else you can say about the state of your computer around the time of discovering the corruption (or the last usage, when corruption might have happened), would be really useful. Did you have to do a hard restart of your computer? Did you restore from backup? Was there a Signal Desktop update?

@scottnonnenberg unfortunately I don't remember any specific details about what happened. Around the time I did have a few hard resets (my mac (10.14.2) was having some trouble waking from sleep), so that's definitely a possibility.

The only potentially abnormal behavior was that it would take Signal quite a long time to load (it would take ~minutes to load, and IIRC the "loading messages …" count finished at about 500), where it would typically load in ~seconds. The database file is ~20mb, and I had (very rough guess) about 10,000 messages in it.

Also, to elaborate on the behavior: when I would first launch Signal.app, it would correctly show any unread messages (ex, if I closed Signal.app, received a message, then opened Signal.app, it would correctly show that my contact had sent an unread message, and I don't recall whether or not it would show the preview), but when I would click the contact, the unread notification would be cleared, but only historical media messages would be displayed (it's possible older unread messages would show too; I don't recall). When I would try to send a message, it would show locally, but it would have the empty circle that indicating "not sent yet".

Would it be possible to retrieve the key used by sqlcipher so I could try loading the database?

scottnonnenberg-signal commented 5 years ago

@wolever Take a look at the other files in the Signal data directory. The key for sqlcipher is there.

twocathouse commented 5 years ago

I just ran into this problem as well. I had hard restarted my macbook right before the error occurred. Following @wolever's advice let me re-link my desktop app

Here are my logs: logs.zip

aguynamedben commented 5 years ago

@scottnonnenberg-signal I also build Electron + SQLCipher app, and a few of our users are getting SQLITE_CORRUPT errors too. From your package.json it looks like this is happening for your users with version 3.2.1 of node-sqlcipher, which means SQLCipher 3.4.1 and node-sqlite3 3.15.2... is that right?

In our app, we're seeing this behavior with SQLCipher 4.0.1 and a custom-built version of node-sqlite3 4.0.6. I don't think we ever saw it with SQLCipher 3.x, which we used for 6 months.

For our app, I'm most suspicious of potential cause 1.1 listed here:

1.1. Continuing to use a file descriptor after it has been closed

I'm also suspicious that Electron's multi-process environment (main process and renderer process) may lead to Electron users accidentally making the mistake mentioned in this potential cause 2.6:

2.6. Carrying an open database connection across a fork()

If it's neither of those... I would be curious if it's a SQLCipher bug, but you're on a different version of SQLCipher, and it looks like nobody has reported any corruption bugs at https://discuss.zetetic.net/, so I'm guessing it's some kind of app lifecycle/forking thing with our app.

Godspeed! 🙏

UPDATE: I also asked here to see if any other SQLCipher+Electron users have dealt with this.

scottnonnenberg-signal commented 5 years ago

@aguynamedben It does seem like hard resets can cause it. But I also suspect that shutdown/update scenarios can sometimes result in two versions of the app competing for access to the database file. This is all edge case/race condition kind of stuff, so very difficult to nail down. But I suppose it could also have something to do with the database itself... it's a tough one.

sjlombardo commented 5 years ago

I can't say conclusively that this is related, but I noticed that sql.js is using PRAGMA schema_version, which is generally not a safe practice. Manipulation of PRAGMA schema_version can lead to database corruption. It is almost always best to SQLite to manage the schema version internally.

For Signal's use case, i.e. tracking application file versions for migrations, the use of PRAGMA user_version would be a much more appropriate replacement.

It may be worth a brief investigation into whether this could be a contributing factor in these corruption incidents.

scottnonnenberg-signal commented 5 years ago

For additional reference: https://github.com/sqlcipher/sqlcipher/blob/162b0610a92e32238fcd3eada5fb78606d109961/src/pragma.c#L1801-L1817

oconnetf commented 5 years ago

I just got this when my MacBook powered up after a battery death. Signal had been open prior to the unclean shutdown.

Database startup error:

Error: SQLITE_CORRUPT: database disk image is malformed
scottnonnenberg-signal commented 5 years ago

@oconnetf Thanks for the report. It looks like you got a dialog which allowed you to delete everything and start from scratch. Did that work for you?

wolever commented 5 years ago

Okay, update: I wasn't able to load the database file using the sqlcipher command line tool, but I was able to get it working by installing the @journeyapps/sqlcipher package.

I used that to write a quick script to query all the tables, and I think I've found the issue:

var fs = require('fs')
var sqlite3 = require('@journeyapps/sqlcipher').verbose()
var db = new sqlite3.Database('sql/db.sqlite')
var config = JSON.parse(fs.readFileSync('config.json'))
db.run(`PRAGMA key = "x'${config.key}'";`)
db.each(`select name from sqlite_master where type = 'table'`, (err, row) => {
  db.each(`select count(*) as count from ${row.name}`, (err, count) => {
    if (err) {
      console.error(`${row.name}: ${err}`)
      return
    }
    console.log(`${row.name}: ${count.count}`)
  })
})

Which yields:

messages_fts: Error: SQLITE_ERROR: no such module: fts5
messages: 13859
sqlite_stat1: 10
conversations: 162
identityKeys: 49
items: 24
preKeys: 73
signedPreKeys: 3
sessions: 29
messages_fts_idx: 145
messages_fts_content: 13859
messages_fts_docsize: 13859
messages_fts_config: 1
attachment_downloads: 0
messages_fts_data: Error: SQLITE_CORRUPT: database disk image is malformed
unprocessed: 64
scottnonnenberg-signal commented 5 years ago

You're saying that it's specifically the messages_fts_data table that's having problems, nothing else? Before we feel sure about that, you should probably get fts5 installed; it may have an effect on what you're seeing.

sjlombardo commented 5 years ago

@wolever you'll need to enable fts5 (--enable-fts5). Once you do, I'd like to know what is the output from PRAGMA integrity_check;

wolever commented 5 years ago

Yea, I'm not sure what to do there. As far as I can tell it's working with my system sqlite install – I can run sqlite3 <<< "select fts5(42);" without error – but for some reason it doesn't work from node (ie, running db.each("select fts5(42)") yields SQLITE_ERROR: no such function: fds5.

wolever commented 5 years ago

@sjlombardo ah dang, didn't see your message when I posted. Do you happen to know why node would be using a different version of sqlite than the system version? Or would this be a "hunt the dependencies" kind of thing?

aguynamedben commented 5 years ago

@wolever SQLite is basically just a C library, it's not a database service that runs on your system like you'd expect if you've used MySQL/PostgreSQL a bunch.

Signal Desktop embeds a special version of the SQLite C library that:

The (corrupt) Signal database was created with SQLCipher + FTS5 enabled.

Your computer probably currently has:

In order to read your database (thank you for helping! 🙏) you need to install SQLCipher with FTS5 enabled.

The easiest way to do this if you're on a Mac is to use the SQLCipher via Homebrew (see recipe) (note -DSQLITE_ENABLE_FTS5 on line 27 of the recipe).

$ brew install sqlcipher
(output...)
$ which sqlcipher
/usr/local/bin/sqlcipher
$ sqlcipher /path/to/database.db
sqlite> PRAGMA KEY = "blah";
sqlite> PRAGMA integrity_check;

The sqlcipher binary installed by Homebrew is the same command-line tool as sqlite, except is has SQLCipher and has FTS5 enabled. If which sqlcipher fails, you probably need to add /usr/local/bin to your bash/ZSH path, or just run /usr/local/bin/sqlcipher /path/to/database.db.

@wolever and SQLCipher team... thank you so much for helping debug this. My app is having a seemingly similar issue, but we can't get a copy of a database where this has happened, so your help in debugging the corrupt database is really key!

aguynamedben commented 5 years ago

@wolever A few more notes for you on trying to run PRAGMA integrity_check;

~Make sure you're using SQLCipher 4.x~

UPDATE: Sorry, it looks like Signal Desktop uses SQLCipher 3.4.2 After running the sqlcipher binary installed by Homebrew, you can use PRAGMA cipher_version; to ~verify you're running SQLCipher 4.x.~

sqlite> PRAGMA cipher_version;
4.1.0 community

Trying to open a SQLCipher 3.x file with SQLCipher 4.x will probably fail. I had an old 3.x version installed by homebrew... so had to brew uninstall sqlcipher and brew install sqlcipher.

You must set PRAGMA key (the SQLCipher key) before running the integrity check

falcon!ben:/Users/ben/code/tesla$ sqlcipher ~/Library/Application\ Support/Electron/ce-4.db
SQLCipher version 3.27.2 2019-02-25 16:06:06
Enter ".help" for usage hints.
sqlite> PRAGMA integrity_check;
Error: file is not a database
sqlite> PRAGMA key = "my-key";
sqlite> PRAGMA integrity_check;
ok
wolever commented 5 years ago

🎉 Downgrading to sqlcipher 3.4.2 seems to have made things work.

There might be a better way to do it, but I downgraded with:

$ cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula
$ git show 4abec2d11a5eb374c6610ae924aae94c56883c99:./sqlcipher.rb > sqlcipher.rb
$ brew install sqlcipher

And here's the result:

$ sqlcipher sql/db.sqlite
SQLCipher version 3.20.1 2017-08-24 16:21:36
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> PRAGMA key = "x'6a…5d'";
sqlite> PRAGMA integrity_check;
*** in database main ***
Page 2494: btreeInitPage() returns error code 11
Page 9709 is never used
sqlite> select count(*) from messages_fts;
13859
sqlite> select count(*) from messages_fts_data;
Error: database disk image is malformed
scottnonnenberg-signal commented 5 years ago

@wolever Just to confirm, you tried to query every table but messages_fts_data is the only one that returned an error?

wolever commented 5 years ago

@scottnonnenberg I believe so, yes (assuming that the count(*) I ran on each table will do a scan that will touch every row; I did not select * from each table, though)

scottnonnenberg-signal commented 5 years ago

Well, one thing I can say is that the corruption errors precede our use of the fts5 module. We started using that early this year, but you can search for the error message and find issues from last year.

Of course, it is possible that the use of fts5 made the problem more common. Also worth noting that we didn't have any triggers before our version 8 schema (which also introduced fts5 usage).

sjlombardo commented 5 years ago

@wolever The next step would be for us to look at what is in the database file around the location of page 2494. Can you please run the following command?

hexdump -C sql/db.sqlite | tail -n +159616 | head -n 130 > pages.txt

That this will output the contents of the encrypted database file around the page number referenced in the interity_check report into pages.txt. Because the database file is encrypted the output should not contain anything secret (it should be encrypted with your key). Even so, just to be safe take a quick look at pages.txt just to make sure there isn't any sensitive information present (for example, the "secrets" in the white box here). Assuming not, post the results up to a gist, or you can send it to us via email/PGP.

wolever commented 5 years ago

@sjlombardo great, I've emailed the dump to support@zetetic.net.

Additionally, if it would be helpful, I'd be happy to jump on a screen share debug session so you can poke around at the file live.

sjlombardo commented 5 years ago

@scottnonnenberg does signal desktop use WAL or standard DELETE mode?

@wolever when the crash occurred, do you happen to know if there were any files ending in -journal or -wal in the folder with the database? If so, do you have them in addition to the db.sqlite file?

developernotes commented 5 years ago

Hi,

In an attempt to gather some additional information regarding the corruption issue can someone provide details regarding:

wolever commented 5 years ago

@sjlombardo there are not currently any journal or wal files in the directory. I relaunched signal a few times, though, so it's entirely possible that they were cleaned up.

@developernotes unfortunately I don't remember the version of Signal that was active when the corruption happened. The corruption happened around January, so it's likely to have been the latest official macOS build from that time.

sjlombardo commented 5 years ago

Hello @wolever. We have authored a small tool we'd like you to run on the corrupted database. Each page in a SQLCipher database has an attached MAC which is used to authenticate that the page contents are valid and have not been tampered with. This MAC is applied after encryption. The tool will run on your encrypted database and verify the state of each page, and report any issues. You can download it here:

https://github.com/sqlcipher/sqlcipher-tools/blob/master/verify.c

To build the tool run the following command (this assumes OpenSSL is installed in the default Homebrew location /usr/local/opt/openssl):

clang verify.c -DPAGESIZE=1024 -DPBKDF2_ITER=64000 -DHMAC_SHA1  -I /usr/local/opt/openssl/include -L /usr/local/opt/openssl/lib -l crypto -o verify

Then, run the verify program on your database, replacing the -k parameter with your database encryption key e.g.

./verify -f sql/db.sqlite -k "x'6a…5d'"

This should provide a report of the pages in the database, and will help us differentiate whether the database is internally corrupted (i.e. corrupted by the database library prior to encryption by SQLCipher), or if it is externally corrupted (i.e. the data was manipulated after the write of the encrypted content).

Can you please run this at your earliest convenience and let us know the output. Let us know if you have any questions on building or running the tool.

sjlombardo commented 5 years ago

Hello @wolever have you had a chance to look at running the verify program on your corrupted database? Right now your database is the only database snapshot we have of the problem, it would be super helpful see the results. Thanks!

wolever commented 5 years ago

@sjlombardo hey! Sorry for the delay - I'm just getting back from some travel.

I've tried to run the tool, but unfortunately something doesn't seem to be working. It reports all pages as being invalid, even on a valid database.

I've compiled it for both versions 3 and 4 (verify3 and verify4), and this is what I get when I run it against a valid database:

$ cat config.json
{
  "key": "465...1c8"
}
$ ./verify3 -f sql/db.sqlite -k "x'465...1c8'" | tail -3
page 11772 is invalid
page 11773 is invalid
scanned 11773 pages and found 11773 invalid, database is corrupt
$ ./verify4 -f sql/db.sqlite -k "x'465...1c8'" | tail -3
page 2943 is invalid
page 2944 is invalid
scanned 2944 pages and found 2944 invalid, database is corrupt
$ ./verify3 -f sql/db.sqlite -k "465...1c8" | tail -3
page 11772 is invalid
page 11773 is invalid
scanned 11773 pages and found 11773 invalid, database is corrupt
$ ./verify4 -f sql/db.sqlite -k "465...1c8" | tail -3
page 2943 is invalid
page 2944 is invalid
scanned 2944 pages and found 2944 invalid, database is corrupt

I see a similar result running against the corrupt database, but I presume that's to be expected.

sjlombardo commented 5 years ago

@wolever thanks so much for giving this a try! I forgot that signal desktop was using raw key syntax. I've just pushed an update to verify.c that fixes it to support raw hex keys. Can you grab the latest, compile for SQLCipher 3 settings and try it one more time?

wolever commented 5 years ago

Okay! That works better.

$ ./verify3 -f ./sql/db.sqlite -k "x'6a1b…375d'"
scanned 22256 pages and 0 are invalid, database is intact

And I've confirmed that using an incorrect key "breaks" it, reporting all pages as invalid, as does using verify4.

I've also double checked that this is the broken database (and not accidentally running it against my working database).

sjlombardo commented 5 years ago

Hi @wolever - Thanks for running that! Based on the verification of the per-page MACs we can at least conclude that the database is intact as SQLCipher wrote it. That helps effectively rule out a substantial set of corruption causes related to unintentional modification of the database file outside of the library, and makes it more likely that this is an "internal" corruption.

As a next step @aguynamedben is going to put together a standalone Electron app for us that we can use to perform some isolated testing under difference scenarios. We'll see if we can reproduce some corruption during a crash, etc.

wolever commented 5 years ago

Okay, awesome! And like I mentioned, if it would be helpful, I'd be happy to get on screen share with someone to poke at the database and see what's up.

sjlombardo commented 5 years ago

@scottnonnenberg - Cross-posting this from the SQLCipher discussion site... The new release of SQLCipher 4.2.0 adds PRAGMA cipher_integrity_check. It performs an independent check of each page in a database using the stored HMAC, and produces a list of an errors found. It might be a good idea to collect some information when these errors occur. If no problems are found with the "envelope" of the database by cipher_integrity_check, yet the database it is being reported as SQLITE_CORRUPT during use, then it is likely that some problem corrupted the database internally (e.g. logic error, improper use of the database API, etc.).

This might confirm what @wolever was seeing, and potentially narrow things down on a wider scale.

scottnonnenberg-signal commented 5 years ago

@sjlombardo We'll get on that - thanks!

scottnonnenberg-signal commented 4 years ago

We have another example of corruption on macOS: https://github.com/signalapp/Signal-Desktop/issues/3599

It looks as if the corruption happened not long after starting the app - a few queries succeeded before these started: SQLITE_CORRUPT: database disk image is malformed. The integrity check right after opening the database didn't catch it. I've asked for a restart of the app, which should give us integrity check results.