resilar / sqleet

SQLite3 encryption that sucks less
The Unlicense
375 stars 55 forks source link

Cannot set key again if having removed key #26

Closed einhugur closed 5 years ago

einhugur commented 5 years ago

I am unable to set key again if I Set Key then remove key. Then open the DB again with no key, then neither SetKey or ReKey will work again. (Key is successfully removed as DB Browser for Sqlite can open it at that time)

resilar commented 5 years ago

SetKey on a non-empty database is not supposed to succeed, but ReKey should.

I'm unable to reproduce. Are you using the latest version? Release or git master? Do you set & remove the key using DB Browser, shell or C interface? In addition, .dbinfo output of the database before setting the key first time and after removing the key could be useful.

$ rm -f db
$ ./sqleet 'file:db?key=swordfish' "CREATE TABLE f(x,y)"
$ ./sqleet 'file:db?key=swordfish' "INSERT INTO f VALUES(4,2)"
$ xxd db | head -n1
00000000: 0a98 87dd ae61 7903 e4a8 5d99 c557 0b64  .....ay...]..W.d
$ ./sqleet 'file:db?key=swordfish' "PRAGMA rekey=''" && xxd db | head -n1                                 
00000000: 5351 4c69 7465 2066 6f72 6d61 7420 3300  SQLite format 3.
$ ./sqleet db "PRAGMA rekey='GOD'" 
ok
$ xxd db | head -n1
00000000: af20 0557 5c86 297d af85 ed08 3e75 4983  . .W\.)}....>uI.
$ ./sqleet 'file:db?key=GOD' .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE f(x,y);
INSERT INTO f VALUES(4,2);
COMMIT;
einhugur commented 5 years ago

I was using latest release, and the C interface. ReKey works fine, I only get the issue when having done ReKey that removes the key and then put key again on it. I will play a bit more with it and try not using SetKey again in between and see what happens. Thanks.

rocketcyberdev commented 5 years ago

I am having a similar problem using PRAGMA statements. I have a db that was previously encrypted. I rekeyed it to remove the key. Now when I try to rekey again to add the password its not encrypting.

I've stepped through the code and it appears that when sqlite3_rekey_v2 is calling codec_set_to the codec->pagesize is bunk. it contains a value of : -842150451

Any thoughts?

resilar commented 5 years ago

Thanks for the details @rocketcyberdev

I'm still unable to reproduce the actual sqlite3_rekey_v2() error, but I succeeded to trigger the following (presumably related) valgrind error:

sqlite> PRAGMA rekey='';
sqlite> PRAGMA rekey='GOD';
==55348== Conditional jump or move depends on uninitialised value(s)
==55348==    at 0x1D97FE: codec_set_to (sqleet.c:451)
==55348==    by 0x1D9E89: sqlite3_rekey_v2 (sqleet.c:613)
==55348==    by 0x19303F: sqlite3Pragma (sqlite3.c:123854)
==55348==    by 0x1C577F: yy_reduce (sqlite3.c:152913)
==55348==    by 0x1C6B5C: sqlite3Parser (sqlite3.c:153461)
==55348==    by 0x1C7DCC: sqlite3RunParser (sqlite3.c:154597)
==55348==    by 0x194E0B: sqlite3Prepare (sqlite3.c:124831)
==55348==    by 0x195161: sqlite3LockAndPrepare (sqlite3.c:124924)
==55348==    by 0x19535A: sqlite3_prepare_v2 (sqlite3.c:125008)
==55348==    by 0x1EB488: shell_exec (shell.c:11585)
==55348==    by 0x1F952B: runOneSqlLine (shell.c:18110)
==55348==    by 0x1F9A44: process_input (shell.c:18210)
==55348== 
445 static int codec_set_to(Codec *codec, Btree *pBt)
446 ...
451         if (!codec->pagesize)
452             codec->pagesize = sqlite3BtreeGetPageSize(pBt);

Indeed, codec->pagesize is uninitialized since commit ff10548 (Implement URI API). Commit 4a6f062 adds the missing initialization of codec->pagesize, likely fixing this rekeying issue. Closing, please reopen if the commit does not resolve this issue.

Highly appreciated if @einhugur or @rocketcyberdev could test the fix and report back.


I also updated the v0.29.0 release/tag of sqleet to include the fix given the seriousness of the bug. Now thinking about it, this was a little unprofessional because tags should not abruptly change. In future, a better protocol is needed for releasing bugfix releases without "rewriting history".

rocketcyberdev commented 5 years ago

Hi @resilar thanks for this. It appears to have solved the problem we reported.

resilar commented 5 years ago

Great!

rocketcyberdev commented 5 years ago

Hi Resliar

Are there any known issues with using sqleet in a multi threaded app?

I’ve got an app that has multiple threads trying to read/write to db. It appears that even when I use the KEY pragma after opening the db, some of the threads will get the not a db error.

Just wondering if it could be some sort of read/write collision between threads. They are not sharing a db handle. Each thread opens the db separately.

From: resilar notifications@github.com Reply-To: resilar/sqleet reply@reply.github.com Date: Tuesday, September 17, 2019 at 9:13 AM To: resilar/sqleet sqleet@noreply.github.com Cc: Developer developer@rocketcyber.com, Mention mention@noreply.github.com Subject: Re: [resilar/sqleet] Cannot set key again if having removed key (#26)

Thanks for the details @rocketcyberdevhttps://github.com/rocketcyberdev

I'm still unable to reproduce the actual sqlite3_rekey_v2() error, but I succeeded to trigger the following (presumably related) valgrind error:

sqlite> PRAGMA rekey='';

sqlite> PRAGMA rekey='GOD';

==55348== Conditional jump or move depends on uninitialised value(s)

==55348== at 0x1D97FE: codec_set_to (sqleet.c:451)

==55348== by 0x1D9E89: sqlite3_rekey_v2 (sqleet.c:613)

==55348== by 0x19303F: sqlite3Pragma (sqlite3.c:123854)

==55348== by 0x1C577F: yy_reduce (sqlite3.c:152913)

==55348== by 0x1C6B5C: sqlite3Parser (sqlite3.c:153461)

==55348== by 0x1C7DCC: sqlite3RunParser (sqlite3.c:154597)

==55348== by 0x194E0B: sqlite3Prepare (sqlite3.c:124831)

==55348== by 0x195161: sqlite3LockAndPrepare (sqlite3.c:124924)

==55348== by 0x19535A: sqlite3_prepare_v2 (sqlite3.c:125008)

==55348== by 0x1EB488: shell_exec (shell.c:11585)

==55348== by 0x1F952B: runOneSqlLine (shell.c:18110)

==55348== by 0x1F9A44: process_input (shell.c:18210)

==55348==

445 static int codec_set_to(Codec codec, Btree pBt)

446 ...

451 if (!codec->pagesize)

452 codec->pagesize = sqlite3BtreeGetPageSize(pBt);

Indeed, codec->pagesize is uninitialized since commit ff10548https://github.com/resilar/sqleet/commit/ff10548206cd945fc22d51b683908b20b8c6e1a1 (Implement URI API). Commit 4a6f062https://github.com/resilar/sqleet/commit/4a6f0625cd11d9b615073565cb84e7b5da1b4b2c adds the missing initialization of codec->pagesize, likely fixing this rekeying issue. Closing, please reopen if the commit does not resolve this issue.

Highly appreciated if @einhugurhttps://github.com/einhugur or @rocketcyberdevhttps://github.com/rocketcyberdev could test the fix and report back.


I also updated the v0.29.0 release/tag of sqleet to include the fix given the seriousness of the bug. Now thinking about it, this was a little unprofessional because tags should not abruptly change. In future, a better protocol is needed for releasing bugfix releases without "rewriting history".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/resilar/sqleet/issues/26?email_source=notifications&email_token=AJD55TRNCPAECRK44OYSAPTQKDQWRA5CNFSM4ISEG4AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD64VJTA#issuecomment-532239564, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJD55TURMVYGBQITVXRCCNLQKDQWRANCNFSM4ISEG4AA.

resilar commented 5 years ago

Interesting. To be honest, I have not tested multi-threaded setup very much. However, there should be no problem since sqleet's sqlite3_key_v2 & sqlite3_rekey_v2 are protected by the database mutex. SQLITE_NOTADB (assuming it's not coming from sqlite3 code) is returned by verify_page1() which is called holding the database mutex in addition to the pager shared lock, so there should be no synchronization issues.

Have you tried compiling sqleet explicitly with -DSQLITE_THREADSAFE=1 or -DSQLITE_THREADSAFE=2? -DSQLITE_THREADSAFE=0 disables mutexes and makes concurrent use of SQLite3 (and sqleet) unsafe.

They are not sharing a db handle. Each thread opens the db separately.

This might be the culprit. https://www.sqlite.org/threadsafe.html says:


SQLite supports three different threading modes:

Any idea which mode you are using?

rocketcyberdev commented 5 years ago

Thanks Resilar,

Yes we have SQLITE_THREADSAFE=1 (Serialized) set at compile time.

I will go back and do some additional testing to make sure that we’re not somehow accidentally opening the db without the key pragma somewhere in the code path or that some runtime db open is overriding the threadsafe setting.

From: resilar notifications@github.com Reply-To: resilar/sqleet reply@reply.github.com Date: Thursday, September 19, 2019 at 2:42 AM To: resilar/sqleet sqleet@noreply.github.com Cc: Developer developer@rocketcyber.com, Mention mention@noreply.github.com Subject: Re: [resilar/sqleet] Cannot set key again if having removed key (#26)

Interesting. To be honest, I have not tested multi-threaded setup very much. However, there should be no problem since sqleet's sqlite3_key_v2 & sqlite3_rekey_v2 are protected by the database mutex. SQLITE_NOTADB (assuming it's not coming from sqlite3 code) is returned by verify_page1() which is called holding the database mutex in addition to the pager shared lock, so there should be no synchronization issues.

Have you tried compiling sqleet explicitly with -DSQLITE_THREADSAFE=1 or -DSQLITE_THREADSAFE=2? -DSQLITE_THREADSAFE=0 disables mutexes and makes concurrent use of SQLite3 (and sqleet) unsafe.

They are not sharing a db handle. Each thread opens the db separately.

This might be the culprit. https://www.sqlite.org/threadsafe.html says:


SQLite supports three different threading modes:


Any idea which mode you are using?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/resilar/sqleet/issues/26?email_source=notifications&email_token=AJD55TX4AIVZ2Y32JP2ORNDQKMUPBA5CNFSM4ISEG4AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7CRIIY#issuecomment-533009443, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJD55TSUHRYEIFWJHZ33J3LQKMUPBANCNFSM4ISEG4AA.

resilar commented 5 years ago

Now thinking about it, a shared pager lock in verify_page1() might not be enough because the function clears the page cache to force reading the first page from the disk. sqlite3PagerExclusiveLock(pager) is maybe worth trying.

I'll look into this and write some concurrency tests when I have time.

rocketcyberdev commented 5 years ago

Sounds good thanks.

Sent from my iPhone

On Sep 20, 2019, at 2:16 AM, resilar notifications@github.com<mailto:notifications@github.com> wrote:

Now thinking about it, a shared pager lock in verify_page1() might not be enough because the function clears the page cache to force reading the first page from the disk. sqlite3PagerExclusiveLock(pager) is maybe worth trying.

I'll look into this and write some concurrency tests when I have time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/resilar/sqleet/issues/26?email_source=notifications&email_token=AJD55TTB5PCSVDO3SBUMII3QKR2FJA5CNFSM4ISEG4AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7F2ETI#issuecomment-533439053, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJD55TXCVZC2AY4RQYLHKMTQKR2FJANCNFSM4ISEG4AA.

resilar commented 5 years ago

So far I have written a multi-threaded concurrency smoke test program that continuously (re)opens and reads an encrypted sqleet database from dozens of threads. This has not reproduced the SQLITE_NOTADB error, contrary to my expectations, assuming a concurrency bug in verify_page1(). I'll add writer threads probably later this week to see if concurrent writes triggers the error.

resilar commented 5 years ago

@rocketcyberdev Finally some progress! Adding concurrent writers did the trick and caused sqlite3_key() (as well as sqlite3_open() with key embedded in URI) to fail in my test setup - hopefully for the same reason you are experiencing in your app.

Concurrent writes to a non-WAL-journaled SQLite3 database can block readers in addition to other writers. As a result, opening an encrypted sqleet database can fail because the password/key is verified in verify_page1() by reading the first page of the database. Now, if a writer blocks the read, then sqlite3PagerSharedLock() on sqleet.c:371 fails with error code SQLITE_BUSY. The error is unhandled by sqleet, causing sqlite3PagerPagecount() to return 0 as the database page count. As a result, sqleet thinks the database is empty and generates a new random salt that makes the key derivation function to return wrong encryption key. This ultimately explains the SQLITE_NOTADB error.

Whew, sorry for the crazy amount of details. The take home message is that sqleet ignores SQLITE_BUSY error in verify_page1() which should be returned to the caller or handled appropriately. I still need to do further research to determine the expected sqleet behavior in this situation, but my current understanding is that SQLite3 generally requires the user to take care of SQLITE_BUSY and SQLITE_LOCKEDerrors themselves by implementing retrying logic, or using the built-in SQLite3 busy_handler & busy_timeout mechanisms to do that.

There are quite a few approaches to choose from:

  1. Return SQLITE_BUSY directly to the user.
  2. Implement retrying logic (obeying busy_timeout and busy_handler?) and return SQLITE_BUSY if retries fail.
  3. Do not call verify_page1() when an encryption key is set in sqlite3_open() or sqlite3_key(). I do not like this approach because a wrong key is then not immediately detected (a database read or write is required to see if that fails due to an invalid encryption key). However, I kinda like the idea of optionally skipping the verification, because it could be useful in some (multi-threaded) use-cases where the key is known to be correct beforehand; thus, allowing skipping the heavy key derivation process.
  4. Something else that I cannot think of at the moment.

Anyway, give me a week or two to address this issue in sqleet. The issue is not critical since it only causes sqlite3_open() or sqlite3_key() to fail in a multi-threaded setup with a low probability. Retrying the function call can be used as a temporary workaround until then.


Ping @utelle just in case you are interested or have some ideas. I have not yet checked if wxSQLite3 handles SQLITE_BUSY errors, but if not, you might want to do so if my analysis is correct. I can release my pthreads-based concurrency test program if you are unsure about the multi-threaded usage of wxSQLite3

rocketcyberdev commented 5 years ago

Awesome. Thanks for all the details and your efforts. We do use the busy_timeout to have sql retry automatically when a busy error occurs. Right now I think we set at 1000ms. So option 2 sounds like it would work out of the box for us.


From: resilar notifications@github.com Sent: Tuesday, October 1, 2019 8:51:47 AM To: resilar/sqleet sqleet@noreply.github.com Cc: Developer developer@rocketcyber.com; Mention mention@noreply.github.com Subject: Re: [resilar/sqleet] Cannot set key again if having removed key (#26)

@rocketcyberdevhttps://github.com/rocketcyberdev Finally some progress! Adding concurrent writers did the trick and caused sqlite3_key() (as well as sqlite3_open() with key embedded in URI) to fail in my test setup - hopefully for the same reason you are experiencing in your app.

Concurrent writes to a non-WAL-journaled SQLite3 database can block readers in addition to other writers. As a result, opening an encrypted sqleet database can fail because the password/key is verified in verify_page1() by reading the first page of the database. Now, if a writer blocks the read, then sqlite3PagerSharedLock() on sqleet.c:371https://github.com/resilar/sqleet/blob/06e922a71dda1ab7ee5c16c8f9e0fa2924b433ac/sqleet.c#L371 fails with error code SQLITE_BUSY. The error is unhandled by sqleet, causing sqlite3PagerPagecount() to return 0 as the database page count. As a result, sqleet thinks the database is empty and generates a new random salt that makes the key derivation function to return wrong encryption key. This ultimately explains the SQLITE_NOTADB error.

Whew, sorry for the crazy amount of details. The take home message is that sqleet ignores SQLITE_BUSY error in verify_page1() which should be returned to the caller or handled appropriately. I still need to do further research to determine the expected sqleet behavior in this situation, but my current understanding is that SQLite3 generally requires the user to take care of SQLITE_BUSY and SQLITE_LOCKEDerrors themselves by implementing retrying logic, or using the built-in SQLite3 busy_handlerhttps://www.sqlite.org/c3ref/busy_handler.html & busy_timeouthttps://www.sqlite.org/c3ref/busy_timeout.html mechanisms to do that.

There are quite a few approaches to choose from:

  1. Return SQLITE_BUSY directly to the user.
  2. Implement retrying logic (obeying busy_timeout and busy_handler?) and return SQLITE_BUSY if retries fail.
  3. Do not call verify_page1() when an encryption key is set in sqlite3_open() or sqlite3_key(). I do not like this approach because a wrong key is then not immediately detected (a database read or write is required to see if that fails due to an invalid encryption key). However, I kinda like the idea of optionally skipping the verification, because it could be useful in some (multi-threaded) use-cases where the key is known to be correct beforehand; thus, allowing skipping the heavy key derivation process.
  4. Something else that I cannot think of at the moment.

Anyway, give me a week or two to address this issue in sqleet. The issue is not critical since it only causes sqlite3_open() or sqlite3_key() to fail in a multi-threaded setup with a low probability. Retrying the function call can be used as a temporary workaround until then.


Ping @utellehttps://github.com/utelle just in case you are interested or have some ideas. I have not yet checked if wxSQLite3 handles SQLITE_BUSY errors, but if not, you might want to do so if my analysis is correct. I can release my pthreads-based concurrency test program if you are unsure about the multi-threaded usage of wxSQLite3

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/resilar/sqleet/issues/26?email_source=notifications&email_token=AJD55TWHDYLRSNORUHXALC3QMNIXHA5CNFSM4ISEG4AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEABKZCQ#issuecomment-537046154, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJD55TTHQ4MRCB4JVVSPIWTQMNIXHANCNFSM4ISEG4AA.

utelle commented 5 years ago

Ping utelle just in case you are interested or have some ideas. I have not yet checked if wxSQLite3 handles SQLITE_BUSY errors, but if not, you might want to do so if my analysis is correct. I can release my pthreads-based concurrency test program if you are unsure about the multi-threaded usage of wxSQLite3

This is definitely an interesting issue. At the moment I don't know yet if and how wxSQLite3 is affected by this problem. However, wxSQLite3 does not perform a check like verify_page1 as sqleet does. So maybe wxSQLite3 is not affected, but to be sure I would certainly like to get your concurrency test program. TIA.

resilar commented 5 years ago

@rocketcyberdev Preliminary fix 7dc8126 merged to master. sqleet now handles SQLITE_BUSY appropriately by invoking the handler set by sqlite3_busy_handler() or sqlite3_busy_timeout(). Highly appreciated if you could test the fix and tell whether it solves your issue. If not, try also disabling SQLite3 shared-cache mode (disabled by default).


Indeed, wxSQLite3 does not seem to have verify_page1()-style verification of the encryption key when setting a codec. This is probably common behavior as SQLCipher neither automatically verifies the key according to documentation SQLCipher API: Testing the key. Issue #30 tracks optional key verification skipping in sqleet.

In my opinion, immediate feedback from sqlite3_key_v2() about the correctness of the passed-in key is worth the struggle to read the 1st database page for key verification. However, reading the page 1 from the database file on disk has turned out to be a surprisingly difficult task because of B-trees, page caches and other layers of abstractions on top of the disk storage. I just noticed that the current version of sqleet still does not get the synchronization right with shared-cache mode enabled (issue #31).

Private gist of my test program. Unfinished and ugly, but better than nothing (polished final version will hopefully be merged into master in the future). In addition, SQLite3 source's test/threadtest[1-4].c are also useful! The official thread tests, however, do not test encryption codecs at all. Basic encryption functionality (such as key() and rekey()) can be somewhat tested with small code patches, though.

rocketcyberdev commented 5 years ago

Thanks will give it a try and report back.


From: resilar notifications@github.com Sent: Thursday, October 3, 2019 5:50:33 PM To: resilar/sqleet sqleet@noreply.github.com Cc: Developer developer@rocketcyber.com; Mention mention@noreply.github.com Subject: Re: [resilar/sqleet] Cannot set key again if having removed key (#26)

@rocketcyberdevhttps://github.com/rocketcyberdev Preliminary fix 7dc8126https://github.com/resilar/sqleet/commit/7dc8126f58156bf411166b695fb68502b2676475 merged to master. sqleet now handles SQLITE_BUSY appropriately by invoking the handler set by sqlite3_busy_handler() or sqlite3_busy_timeout(). Highly appreciated if you could test the fix and tell whether it solves your issue. If not, try also disabling SQLite3 shared-cache modehttps://www.sqlite.org/sharedcache.html (disabled by default).


Indeed, wxSQLite3 does not seem to have verify_page1()-style verification of the encryption key when setting a codec. This is probably common behavior as SQLCipher neither automatically verifies the key according to documentation SQLCipher API: Testing the keyhttps://www.zetetic.net/sqlcipher/sqlcipher-api/#testing-the-key. Issue #30https://github.com/resilar/sqleet/issues/30 tracks optional key verification skipping in sqleet.

In my opinion, immediate feedback from sqlite3_key_v2() about the correctness of the passed-in key is worth the struggle to read the 1st database page for key verification. However, reading the page 1 from the database file on disk has turned out to be a surprisingly difficult task because of B-trees, page caches and other layers of abstractions on top of the disk storage. I just noticed that the current version of sqleet still does not get the synchronization right with shared-cache modehttps://www.sqlite.org/sharedcache.html enabled (issue #31https://github.com/resilar/sqleet/issues/31).

Private gist of my test programhttps://gist.github.com/resilar/20cbe17b6a7a0a4c4eece6dd823d5180. Unfinished and ugly, but better than nothing (polished final version will hopefully be merged into master in the future). In addition, SQLite3 source's test/threadtest[1-4].c are also useful! The official thread tests, however, do not test encryption codecs at all. Basic encryption functionality (such as key() and rekey()) can be somewhat tested with small code patches, though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/resilar/sqleet/issues/26?email_source=notifications&email_token=AJD55TRYWX7NS2C6PYDMXF3QMZZLTA5CNFSM4ISEG4AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAJ2OZI#issuecomment-538158949, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJD55TQGPKPT2CJ6MO3OX4LQMZZLTANCNFSM4ISEG4AA.

utelle commented 5 years ago

Thanks for providing your thread test code.

Indeed, wxSQLite3 does not seem to have verify_page1()-style verification of the encryption key when setting a codec. This is probably common behavior as SQLCipher neither automatically verifies the key according to [the SQLCipher documentation].

It is true that neither wxSQLite3 nor SQLCipher (nor any other encryption extension I tested, except sqleet) perform an immediate verification of the encryption key.

However, this follows the general behaviour of SQLite itself, as for example SQLite itself does not check the validity of a database file until an SQL command is executed that actually requires to read from (or write to) the database file. That is, a call to sqlite3_open will usually succeed, even if you try to open a non-database file.

One could argue that on opening an existing file SQLite should check immediately whether the file is really a valid database file, but the SQLite developers chose a different approach. And for the time being wxSQLite3 will stick to this "official" approach.

resilar commented 4 years ago

Very good points! I'll consider making the immediate codec verification optional (instead of the other way around), especially if reading the page 1 becomes too much hassle with shared cache mode support. Breaking sqlite3_open() behavior does not feel the right thing to do.

I do not really know if multi-thread shared-cache with encryption is reasonable (or secure) in the first place. SQLCipher segfaults in threadtest3.c tests which utilize shared cache (even if shared cache mode is disabled), while sqleet fails the same tests only with shared cache enabled. In threadtest3.c, I have patched opendb_x() to perform keying (and testing the key if using SQLCipher).

It looks like the page 1 verification of sqleet might actually not be the cause of the failing shared-cache tests (performing the verification later as in SQLCipher/wxSQLite3 does not make the tests pass). More research is needed...

utelle commented 4 years ago

I'll consider making the immediate codec verification optional (instead of the other way around) ...

That is certainly a good approach. For sure there are scenarios where verifying the codec immediately is preferable, and your approach allows to do that without forcing the "typical" sqleet user to modify build options.

I do not really know if multi-thread shared-cache with encryption is reasonable (or secure) in the first place.

I have to admit that I haven't spent much time yet on analyzing this kind of use case. However, I don't see a largely increased security risk in using shared-cache, since it is restricted to threads belonging to a single process.

SQLCipher segfaults in threadtest3.c tests which utilize shared cache (even if shared cache mode is disabled), while sqleet fails the same tests only with shared cache enabled. [...] It looks like the page 1 verification of sqleet might actually not be the cause of the failing shared-cache tests (performing the verification later as in SQLCipher/wxSQLite3 does not make the tests pass). More research is needed...

I will try to dedicate some time to further analyze this shared cache issue.

resilar commented 4 years ago

I will try to dedicate some time to further analyze this shared cache issue.

Highly appreciated if you could take a look! To quickly reproduce, use my patched version of opendb_x() below to run threadtest3.c test cases with encrypted databases. I also highly recommend compiling with -DSQLITE_DEBUG to enable SQLite3 assert()s for detecting subtle bugs/errors that do not cause crash or test case failure.

// threadtest3.c:519
static void opendb_x(
  Error *pErr,                    /* IN/OUT: Error code */
  Sqlite *pDb,                    /* OUT: Database handle */
  const char *zFile,              /* Database file name */
  int bDelete                     /* True to delete db file before opening */
){
  if (pErr->rc == SQLITE_OK) {
    int rc;
    int flags = SQLITE_OPEN_CREATE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_URI
#if 1
            | SQLITE_OPEN_SHAREDCACHE 
#else
            | SQLITE_OPEN_PRIVATECACHE 
#endif
            ;
    if (bDelete) unlink(zFile);
    if ((rc = sqlite3_open_v2(zFile, &pDb->db, flags, 0)) != SQLITE_OK) {
      sqlite_error(pErr, pDb, "open");
      goto fail;
    }

    sqlite3_busy_handler(pDb->db, busyhandler, 0);

#if defined(SQLITE_HAS_CODEC)
    if ((rc = sqlite3_key(pDb->db, "test",  4)) != SQLITE_OK) {
      printf("[ERROR] sqlite3_key('%s', 'test'): %s (%s) %d\n",
             zFile, sqlite3_errmsg(pDb->db), sqlite3_errstr(rc), rc);
      sqlite_error(pErr, pDb, "key");
      goto fail;
    }

    // Validate key (SQLCipher/wxSQLite3)
    rc = sqlite3_exec(pDb->db, "SELECT count(*) FROM sqlite_master;",
                      NULL, NULL, NULL);
    if (rc != SQLITE_OK) {
      printf("[ERROR] sqlite3_verify('%s', 'test'): %s (%s) %d\n",
             zFile, sqlite3_errmsg(pDb->db), sqlite3_errstr(rc), rc);
      sqlite_error(pErr, pDb, "verify");
      goto fail;
    }
#endif

    sqlite3_create_function(
        pDb->db, "md5sum", -1, SQLITE_UTF8, 0, 0, md5step, md5finalize
    );
    sqlite3_exec(pDb->db, "PRAGMA synchronous=OFF", 0, 0, 0);
  }
  return;

fail:
  sqlite3_close(pDb->db);
  pDb->db = 0;
  return;
}
utelle commented 4 years ago

In the meantime I used your variant of threadtest3.c to perform some tests.

If option SQLITE_OPEN_PRIVATECACHE is used threadtest3 runs without crashing producing the following test output. However, 2 of 14 tests fail.

Running walthread1 for 20 seconds...
Thread 11 says: 36 checkpoints
Thread 10 says: 95 iterations
Thread 9 says: 192 iterations
Thread 8 says: 167 iterations
Thread 7 says: 125 iterations
Thread 6 says: 107 iterations
Thread 5 says: 112 iterations
Thread 4 says: 197 iterations
Thread 3 says: 124 iterations
Thread 2 says: 104 iterations
Thread 1 says: 197 iterations
Running walthread2 for 20 seconds...
Thread 4 says: W 22 R 52
Thread 3 says: W 16 R 55
Thread 2 says: W 71 R 1
Thread 1 says: W 70 R 1
Running walthread3 for 20 seconds...
Thread 6 says: ...
Thread 5 says: ...
Thread 4 says: ...
Thread 3 says: ...
Thread 2 says: ...
Thread 1 says: ...
Running walthread4 for 20 seconds...
Thread 2 says: ...
Thread 1 says: ...
Running walthread5 for 1 seconds...
Error: (22) "Invalid argument" at line 1281
Running cgt_pager_1 for 0 seconds...
Running dynamic_triggers for 20 seconds...
Thread 4 says: 2224 created, 2224 dropped
Thread 3 says: 16600 inserts, 16600 deletes
Thread 2 says: 8100 inserts, 8100 deletes
Thread 1 says: 12700 inserts, 12700 deletes
Running checkpoint_starvation_1 for 10 seconds...
 Checkpoint mode  : PASSIVE
 Peak WAL         : 73302 frames
 Transaction count: 22256 transactions
Thread 4 says: ...
Thread 3 says: ...
Thread 2 says: ...
Thread 1 says: ...
Running checkpoint_starvation_2 for 10 seconds...
 Checkpoint mode  : RESTART
 Peak WAL         : 52 frames
 Transaction count: 590 transactions
Thread 4 says: ...
Thread 3 says: ...
Thread 2 says: ...
Thread 1 says: ...
Running create_drop_index_1 for 10 seconds...
Warning: (17) "database schema has changed" at line 39
Thread 5 says: ok
Thread 4 says: ok
Thread 3 says: ok
Thread 2 says: ok
Thread 1 says: ok
Running lookaside1 for 10 seconds...
Thread 6 says: ok
Thread 5 says: ok
Thread 4 says: ok
Thread 3 says: ok
Thread 2 says: ok
Thread 1 says: ok
Running vacuum1 for 10 seconds...
Thread 4 says: ok
Thread 3 says: ok
Thread 2 says: ok
Thread 1 says: ok
Running stress1 for 10 seconds...
Warning: (17) "no such table: t1" at line 64
Warning: (17) "sqlite3_prepare_v2() - no such table: t1 (17)" at line 123
Thread 10 says: deleted from t1 1/1 attempts
Warning: (17) "no such table: t1" at line 64
Warning: (17) "no such table: t1" at line 94
Warning: (1) "sqlite3_reset() - no such table: t1 (1)" at line 123
Thread 9 says: deleted from t1 1/1 attempts
Warning: (17) "no such table: t1" at line 94
Thread 8 says: wrote t1 8/8 attempts
Thread 7 says: wrote t1 47/47 attempts
Thread 6 says: read t1 3686/3687 attempts
Thread 5 says: read t1 2089/2089 attempts
Thread 4 says: ok
Thread 3 says: ok
Thread 2 says: ok
Thread 1 says: ok
Running stress2 for 60 seconds...
Error: (5) "database is locked" at line 260
Thread 15 says: ok
Thread 14 says: ok
Thread 13 says: ok 5/6
Thread 12 says: ok 320/320
Thread 11 says: ok 250/250
Thread 10 says: ok 300/300
Thread 9 says: ok 285/285
Thread 8 says: ok 280/280
Thread 7 says: ok 295/295
Thread 6 says: ok 275/275
Thread 5 says: ok 285/285
Thread 4 says: ok 345/345
Thread 3 says: ok 340/340
Thread 2 says: ok 305/305
Thread 1 says: ok 330/330
2 errors out of 14 tests

If option SQLITE_OPEN_SHAREDCACHE is used threadtest3 outputs

Running walthread1 for 20 seconds...
Error: (6) "database table is locked: t1" at line 1011

and then crashes due to an access violation on trying to encrypt page 2 of the main database. Following the call stack:

sqlite3BtreeGetPageSize(Btree * p) Line 66277
sqlite3Codec(void * pCodecArg, void * data, unsigned int nPageNum, int nMode) Line 84
sqlite3PagerCodec(PgHdr * pPg) Line 57856
walWriteOneFrame(WalWriter * p, PgHdr * pPage, int nTruncate, __int64 iOffset) Line 61761
sqlite3WalFrames(Wal * pWal, int szPage, PgHdr * pList, unsigned int nTruncate, int isCommit, int sync_flags) Line 61962
pagerWalFrames(Pager * pPager, PgHdr * pList, unsigned int nTruncate, int isCommit) Line 54013
sqlite3PagerCommitPhaseOne(Pager * pPager, const char * zMaster, int noSync) Line 57208
sqlite3BtreeCommitPhaseOne(Btree * p, const char * zMaster) Line 67386
vdbeCommit(sqlite3 * db, Vdbe * p) Line 79092
sqlite3VdbeHalt(Vdbe * p) Line 79493
sqlite3VdbeExec(Vdbe * p) Line 87312
sqlite3Step(Vdbe * p) Line 82296
sqlite3_step(sqlite3_stmt * pStmt) Line 82362
sqlite3_exec(sqlite3 * db, const char * zSql, int(*)(void *, int, char * *, char * *) xCallback, void * pArg, char * * pzErrMsg) Line 119923
sql_script_x(Error * pErr, Sqlite * pDb, const char * zSql) Line 631
walthread1_thread(int iTid, void * pArg) Line 1005
launch_thread_main(void * pArg) Line 811

The cause of the problem is that the codec structure holds a pointer to a SQLite3 database structure and a pointer to a SQLite3 btree structure. Both structures do not contain valid data, but seem to be uninitialized.

My guess is that special measures would be required to determine the database and btree associated with a codec structure in shared cache mode. However, at the moment I have no idea which steps exactly would be required to accomplish proper operation in shared cache mode.

resilar commented 4 years ago

Thanks for your input. I have not yet taken a deep dive into the shared cache concurrency stuff, so please take my misguided thoughts (in this and earlier comments) as a grain of salt.

Feel free goto TLDR. The verbosity level of this post is over 9000. Sorry, personal(ity) issues.


I just noticed that the official SQLite3 source code does not pass threadtest3 with the opendb_x() patch I gave above when enabling SQLITE_OPEN_SHAREDCACHE for all databases.

Therefore, my opendb_x() patch is not a reasonable reference point with SQLITE_OPEN_SHAREDCACHE unconditionally enabled. My bad, I thought SQLITE_OPEN_SHAREDCACHE would merely highlight implementation errors of encryption extensions in shared cache mode. This is not true.


The primary focus should be on threadtest3 tests run with neither SQLITE_OPEN_SHAREDCACHE nor SQLITE_OPEN_PRIVATECACHE set in opendb_x() to match the original test code. This way most test cases use private cache, while shared cache mode is explicitly enabled by some test cases such as dynamic_triggers, stress1, stress2 and maybe others.

With neither cache flag force-enabled by opendb_x(), the official SQLite3 code without encryption passes all test cases with 0 errors (and usually at most a couple "no such table: t1" warnings from stress1). In comparison, sqleet passes tests walthread[1-5] and checkpoint_starvation_[1-2] succesfully with no errors, but fails all subsequent test cases with multiple SQLITE_NOATDB errors per each test (in addition to a few SQLITE_AUTH and SQLITE_NOMEM errors, supposedly caused by the same root issue). sqleet finishes threadtest3 with the final result "36 errors out of 14 tests". In private cache mode, all errors vanish implying sqleet handles shared pager cache improperly.

If option SQLITE_OPEN_PRIVATECACHE is used threadtest3 runs without crashing producing the following test output. However, 2 of 14 tests fail.

In my test environment, sqleet as well as SQLite3 (without codecs) finishes threadtest3 with "1 errors out of 14 tests" when using private cache. The only error being "database is locked" (SQLITE_LOCKED) in the beginning of stress2 test case. Additionally, sqleet gets "database schema has changed" warning right after the database locked error (no idea if this is meaningful or not).

The SQLITE_LOCKED error I'm experiencing appears to be a side effect of SQLITE_OPEN_PRIVATECACHE force-enabled by the patched version of opendb_x(). Disabling private cache by either explicitly setting SQLITE_OPEN_SHAREDCACHE or removingSQLITE_OPEN_PRIVATECACHE (i.e., both cache flags unset) makes sqleet and SQLite3 pass stress2 with 0 errors and 0 warnings.

@utelle Nevertheless, I'd suggest investigating further that walthread5 "invalid argument" error because it is observed only with wxSQLite3. However, the sample size here is only 2 so the error may very well be caused by differences in our test setups rather than an actual bug in wxSQLite3 code.

If option SQLITE_OPEN_SHAREDCACHE is used threadtest3 outputs ... and then crashes due to an access violation on trying to encrypt page 2 of the main database

The official SQLite3 code (without encryption) also frequently crashes during the walthread1 test case in shared cache mode. However, just to be specific, the exact time of the crash varies a lot, and about every fifth test run passes walthread1 without crashing and, after that, finishes the rest of the test cases with a total of "18 errors out of 14 tests" (10 walthread1 errors, 3 walthread2 errors, 1 walthread3 errors, 1 walthread4 error, 2 checkpoint_starvation_1 errors, 1 checkpoint_starvation_2 error, 1 stress1 warning, and no errors from the remaining test cases). Out of these 18 errors, 17 are locked database errors (SQLITE_LOCKED), and the only different one is "WAL failed to grow - 0 frames" in checkpoint_starvation_1 immediately after (and probably caused by) a database locking error.

In comparison, sqleet in shared cache mode always crashes during walthread1 due to a NULL-pointer dereference in threadtest3.c when comparing SQL execution results. The result strings, in this case, are NULL pointers apparently because of SQLITE_NOTADB error(s) encountered while keying the database. I'm unsure about whether the SQLITE_NOTADB originates from sqleet or SQLite3 internals, but the root cause is most likely improper handling of shared cache mode in sqleet.


TLDR:


Sorry for the wall of text. Writing helps me to think about complex topics, which sometimes results in insane posts like this. I'll try to keep the verbosity level within sane limits in the future.

utelle commented 4 years ago

I will have to digest all the information you gave. However, I really appreciate that you share your thoughts in full. 😄

Therefore, just a quick comments about the current status of my tests. More to come later.

1) Shared cache mode and encryption Item 16 in the SQLite change log for release of SQLite3 version 3.7.15 indicates that using shared cache and encryption together should work. And this means that, if wxSQLite3, sqleet, and SQLCipher don't work properly in shared cache mode, then they are flawed in this aspect. And we should try to fix this for wxSQLite3 and sqleet.

2) Private vs shared cache mode

3) Error "Invalid argument" (walthread5) in wxSQLite3 I have not yet found out, what causes this error, unfortunately.

For reliable results we will need a test application that works at least for standard SQLite3 without errors. Otherwise it will be very hard to verify which issues are caused by the encryption extension and which not.

resilar commented 4 years ago

For reliable results we will need a test application that works at least for standard SQLite3 without errors.

Standard SQLite3 without encryption passes threadtest3 with 0 errors when using unmodified opendb_x() or my patched version with the added cache open flags removed (test cases designed for shared cache mode enable it manually at runtime). In this default setting, sqleet finishes threadtest3 with a dozen shared-cache related errors. Thus, I have something to compare against.

Remove the setting of SQLITE_OPEN_SHAREDCACHE and SQLITE_OPEN_PRIVATECACHE flags from the patched opendb_x() and check how many threadtest3 errors wxSQLite3 gets. If none, then shared cache support of wxSQLite3 is at least acceptable.

utelle commented 4 years ago

With my modified wxSQLite3 version (referencing BtShared structure instead of Btree structure) I get the following results with your patched version of threadtest3 (with cache open flags removed):

Running walthread1 for 20 seconds...
Thread 11 says: 37 checkpoints
Thread 10 says: 115 iterations
Thread 9 says: 109 iterations
Thread 8 says: 108 iterations
Thread 7 says: 118 iterations
Thread 6 says: 93 iterations
Thread 5 says: 130 iterations
Thread 4 says: 98 iterations
Thread 3 says: 106 iterations
Thread 2 says: 67 iterations
Thread 1 says: 136 iterations
Running walthread2 for 20 seconds...
Thread 4 says: W 14 R 52
Thread 3 says: W 12 R 54
Thread 2 says: W 65 R 1
Thread 1 says: W 64 R 0
Running walthread3 for 20 seconds...
Thread 6 says: ...
Thread 5 says: ...
Thread 4 says: ...
Thread 3 says: ...
Thread 2 says: ...
Thread 1 says: ...
Running walthread4 for 20 seconds...
Thread 2 says: ...
Thread 1 says: ...
Running walthread5 for 1 seconds...
Error: (22) "Invalid argument" at line 1286
Running cgt_pager_1 for 0 seconds...
Running dynamic_triggers for 20 seconds...
Thread 4 says: 416 created, 416 dropped
Thread 3 says: 5800 inserts, 5800 deletes
Thread 2 says: 2800 inserts, 2800 deletes
Thread 1 says: 3600 inserts, 3600 deletes
Running checkpoint_starvation_1 for 10 seconds...
 Checkpoint mode  : PASSIVE
 Peak WAL         : 60792 frames
 Transaction count: 18508 transactions
Thread 4 says: ...
Thread 3 says: ...
Thread 2 says: ...
Thread 1 says: ...
Running checkpoint_starvation_2 for 10 seconds...
 Checkpoint mode  : RESTART
 Peak WAL         : 52 frames
 Transaction count: 781 transactions
Thread 4 says: ...
Thread 3 says: ...
Thread 2 says: ...
Thread 1 says: ...
Running create_drop_index_1 for 10 seconds...
Thread 5 says: ok
Thread 4 says: ok
Thread 3 says: ok
Thread 2 says: ok
Thread 1 says: ok
Running lookaside1 for 10 seconds...
Thread 6 says: ok
Thread 5 says: ok
Thread 4 says: ok
Thread 3 says: ok
Thread 2 says: ok
Thread 1 says: ok
Running vacuum1 for 10 seconds...
Thread 4 says: ok
Thread 3 says: ok
Thread 2 says: ok
Thread 1 says: ok
Running stress1 for 10 seconds...
Warning: (1) "sqlite3_reset() - no such table: t1 (1)" at line 123
Warning: (1) "sqlite3_prepare_v2() - no such table: t1 (1)" at line 123
Thread 10 says: deleted from t1 1/1 attempts
Thread 9 says: deleted from t1 11/18 attempts
Thread 8 says: wrote t1 4335/4855 attempts
Thread 7 says: wrote t1 4361/4844 attempts
Thread 6 says: read t1 2597/8122 attempts
Thread 5 says: read t1 6546/10575 attempts
Thread 4 says: ok
Thread 3 says: ok
Thread 2 says: ok
Thread 1 says: ok
Running stress2 for 60 seconds...
[ERROR] sqlite3_verify('test.db', 'test'): database schema is locked: main (database table is locked) 6
Thread 15 says: ok
Thread 14 says: ok
Thread 13 says: ok 228/270
Thread 12 says: ok 265/265
Thread 11 says: ok 155/245
Thread 10 says: ok 189/275
Thread 9 says: ok 225/295
Thread 8 says: ok 183/280
Thread 7 says: ok 200/290
Thread 6 says: ok 195/275
Thread 5 says: ok 227/285
Thread 4 says: ok 280/280
Thread 3 says: ok 280/280
Thread 2 says: ok 205/280
Thread 1 says: ok 226/290
1 errors out of 14 tests

That is, I got 1 error (and 2 warnings) from walthread5 (Error: (22) "Invalid argument" at line 1286) - I still have to find out, what's causing the error "Invalid argument". Additionally, I got 1 error from stress2 ([ERROR] sqlite3_verify('test.db', 'test'): database schema is locked: main (database table is locked) 6) - however, this is no error from the original version of threadtest3 ... and it does not occur every time I run the test.

utelle commented 4 years ago

Regarding the error from walthread5 [Error: (22) "Invalid argument" at line 1286] I'm at a loss. The error is produced in function filecopy_x of threadtest3 - in the line if( nCopy!= read(fd1, aBuf, nCopy) ){ invoked from filecopy(&err, "test.db", "test_sv.db");.

That is, the file test.db should simply be copied to file test_sv.db. File test.db exists (and had a size of 1024 bytes in one of my tests). However, the function read returned 56 as the number of bytes read. Since 56 != 1024 function system_error is invoked and returns error code 22 (= ERROR_BAD_COMMAND - The device does not recognize the command.).

I don't see where wxSQLite3 would be to blame as these are C library functions producing the error.

utelle commented 4 years ago

After inspecting the database file at the position where the read stopped, I found out that the next byte in the file was a 0x1A (= ^Z) - and that character is a "end of file" marker for text files.

I modified the code of threadtest3, so that it uses the function _open instead of open, and I added the flag for binary files. Thereafter the error vanished.

Note: my test environment is Windows 10 with Visual Studio.

resilar commented 4 years ago

Okay, that explains the invalid argument error. Good work! Nasty bug though.

I get the following results with your patched version of threadtest3 (with cache open flags removed) ... That is, I got 1 error (and 2 warnings) [out of 14 tests]

That is an exceptionally good result in comparison to the "36 errors out of 14 tests" result by sqleet. However, I managed to reduce the number of errors from 36 down to 0 by skipping the immediate verify_page1() verification using raw keys that do not need the database salt for key derivation. Clearly reading the database page 1 to initialize the codec with the salt & KDF is the buggy part in sqleet when using shared cache mode concurrently from multiple threads.

I'm starting to feel much more hopeful now that the root cause of shared-cache issues has been identified somewhat accurately. :) Well, at least in sqleet. wxSQLite3 also seems to be getting very close to a stable concurrent shared-cache mode with encryption (I do not know if the few remaining errors/warnings are just false positivies or non-issues in practice).

Additionally, I got 1 error from stress2 ([ERROR] sqlite3_verify('test.db', 'test'): database schema is locked: main (database table is locked) 6) - however, this is no error from the original version of threadtest3 ... and it does not occur every time I run the test.

IIRC I have seen this error also with the standard SQLite3 (no encryption). However, I cannot reproduce it now successfully, so I might be recalling incorrectly. Better to investigate the error thoroughly, just in case. I'll look into it if the error starts occurring in sqleet after fixing the more severe bugs in the database locking & access.

utelle commented 4 years ago

In the meantime I somewhat cleaned up my modifications of the wxSQLite3 encryption extension and committed the code to the wxSQLite3 GitHub repository. This code seems to work now sufficiently well in shared cache mode - no errors, only warnings in stress1 test of threadtest3.

Additionally, I adjusted the test applications threadtest2 and threadtest4 to use encrypted databases. Both run without errors for wxSQLite3. I have not yet tested sqleet with those new tests.

resilar commented 4 years ago

Congratulations!

Late reply because I had to suddenly drop this project for a couple months due to being busy with other things in life. However, today I further investigated sqleet's 33 threadtest3 errors and finally got the perfect score "0 errors out of 14 tests" with threadtest3 patched to use encryption by opening URI file:test.db?key=swordfish instead of the filename test.db. The core problem was presumably somewhere in verify_page1() because all 33 errors disappeared after bypassing the pager and page cache by re-implementing the key verification using low-level sqlite3Os*() file & lock functions.

I was unable to determine the exact problem accurately, but sqlite3PcacheClear() seemed to produce (indirect) crashes and memory errors in other threads. Moreover, I have not yet tried to force-enable shared-cache or private-cache in the whole threadtest3 test suite. Such test configuration may be unreasonable or even impossible to pass (with official SQLite3 and no encryption), but the test results could be interesting nonetheless.

Anyway, sqleet with robust (shared-cache) concurrency support is still coming. The current "fixed" development version needs refactoring and preferably more research to find better solutions than using sqlite3Os*() low-level functions to verify the key during opening the database. Obviously, there is also the option of verifying the key "lazily" when the database is accessed (read from/written to) the first time. AFAIK wxSQLite3 and SQLCipher works like this (although I'm pretty sure that the latter does not pass threadtest3 without crashing and dozens of errors).

Issue #30 tracks the implementation of optional "lazy key verification" feature. It might become the default if key verification immediately in sqlite3_open() (or equivalent) turns out to be too difficult.