Problem with memory databases and encryption

resilar / sqleet

SQLite3 encryption that sucks less

The Unlicense

375 stars 55 forks source link

Problem with memory databases and encryption #33

Open utelle opened 4 years ago

utelle commented 4 years ago

A few days ago an issue was opened for my wxSQLite3 component regarding the use of rekey for a memory database. As mentioned on the SQLite SEE documentation page in-memory databases are not encrypted (using PRAGMA key='passphrase'; has absolutely no effect). The user was satisfied with that information and simply removed the call to the rekey function that was causing problems for him.

However, IMHO the current behaviour of the key and rekey functions if called accidentally for in-memory databases can cause problems and should therefore be addressed.

The problem affects not only wxSQLite3, but also sqleet, as can be verified by executing the following SQL command sequence in the sqleet shell:

.open :memory:
create table t1 (c1 int, c2 char);
insert into t1 values (1,'A text value');
pragma rekey='test';
select * from t1;

The last select command results in displaying Error: file is not a database.

I have not yet tracked down from where this error results. All I can say is that the page encryption/decryption function is never called while executing rekey.

AFAICT not only in-memory databases, but also temporary databases are affected. The question is how wxSQLite3 and sqleet should behave under those circumstances. Should the key and rekey functions detect whether they operate on an in-memory or temporary database and silently do nothing? Or should they issue an error message? What's your opinion?

resilar commented 4 years ago

I agree that (re)key on in-memory or temporary database should fail or at least return a warning. I'm afraid, though, that there is no clean way to implement this without patching SQLite3 internals, which I'd like to avoid at all cost. If this turns out to be the case, the best way forward may be to contact upstream via sqlite-users or sqlite-dev mailing list and hope that they are interested in addressing the issue in SQLite3 code. SQLite3 SEE (most likely) suffers from this issue, so an upstream fix is entirely possible.

However, SQLITE_TEMP_STORE >= 2 mitigates the security impact by storing temporary files (including temporary databases to my understanding) in memory. This is why sqleet.h defaults to value 2:

/*
 * # SQLITE_TEMP_STORE
 * - 0 Store temp files on disk
 * - 1 Store temp files on disk but allow overriding with `PRAGMA temp_store`
 * - 2 Store temp files in memory but allow overriding with `PRAGMA temp_store`
 * - 3 Store temp files on memory
 * Note that temp files are *NOT* encrypted so using either 2 or 3 is critical!
 */
#ifndef SQLITE_TEMP_STORE
#define SQLITE_TEMP_STORE 2
#endif

Thus, accidentally flushing unencrypted temporary databases on disk is quite unlikely to happen in practice. In-memory databases are also never written to the disk. So this is mostly an UX issue with low security impact (still completely worth fixing). Just wanted to make that clear to everyone.

utelle commented 4 years ago

I agree that (re)key on in-memory or temporary database should fail or at least return a warning.

Another option would be to simply make sqlite3_key and sqlite3_rekey a no-op in those cases.

I'm afraid, though, that there is no clean way to implement this without patching SQLite3 internals, which I'd like to avoid at all cost.

It should be possible to detect whether the associated database is in-memory or temporary within the code of sqlite3_key and sqlite3_rekey without touching the SQLite3 internals (by duplicating the internal code for detecting the special file name :memory: or a blank file name).

Regarding your remark ... patching SQLite3 internals, which I'd like to avoid at all cost ... I fear that the time will come rather soon that this can't be avoided any longer. Maybe you have not yet seen the changes to the public SQLite code that took place on Feb 7, 2020: “Simplify the code by removing the unsupported and undocumented SQLITE_HAS_CODEC compile-time option”. That is, most likely starting with the next release of SQLite3 it will no longer be possible to add encryption by simply combining the public SQLite3 code with the code of an encryption extension, unfortunately.

If this turns out to be the case, the best way forward may be to contact upstream via sqlite-users or sqlite-dev mailing list and hope that they are interested in addressing the issue in SQLite3 code. SQLite3 SEE (most likely) suffers from this issue, so an upstream fix is entirely possible.

I fear it will be quite unlikely that an upstream fix will happen due to the new SQLite3 policy mentioned above.

resilar commented 4 years ago

Holy shit @ SQLITE_HAS_CODEC removal. That's a game changer, lol. SQLite3 patching or hacky VFS-based encyption it is then. That sucks.

utelle commented 4 years ago

Holy shit @ SQLITE_HAS_CODEC removal. That's a game changer, lol.

Yes, indeed. SQLITE_HAS_CODEC has been around for at least 15 years. Its removal will make providing encryption extensions most likely more and more difficult over time.

SQLite3 patching or hacky VFS-based encyption it is then. That sucks.

Although a VFS-based encryption seems to be an attractive alternative at first glance, the devil is in the details. The VFS API is rather low-level - for read/write operations you only get a buffer plus buffer length plus file offset, but information about page size and reserved bytes per page is missing. That is, this information has to be passed to the VFS somehow on your own.

For the database file itself the VFS approach would most likely work rather well, because AFAICT except for page 1 (which contains the header) always full pages are read or written. For the journal files this is quite different: there are file headers and frame headers (depending on the journal type) in addition to the actual page content, and only the page content is encrypted. To make things worse, the headers contain checksums over the page content (currently the encrypted content!). Detecting page content reads and writes will require to access the headers. Not impossible, but probably messy.

utelle commented 4 years ago

In the meantime I started to work on the VFS approach, because I fear that keeping up the support of the current approach for future versions of SQLite will develop into growing pain in the long run.

The good news is that a pointer to the SQLite database connection handle (sqlite3*) is passed into VFS implementations. That is, locating the pager, determining page size and so on is possible without major programming effort. However, this feature is also undocumented, but hopefully unlikely to vanish.

Another interesting property of the VFS approach is that it is very easy to support custom pragma commands for configuration purposes.

On the down side is that the KEY parameter of the ATTACH command will no longer work in future versions of SQLite (The keyword is still parsed, but the value is then simply ignored).

There are a few other obstacles, of course, but hopefully nothing that can't be managed somehow.

resilar commented 4 years ago

I agree with your thoughts about VFS-based encryption. VFS encryption is the way forward, but it will require a lot of work to get all the details right (if it is even possible without crazy hacks). The potential benefits are huge because VFS-interface gives us full control of temporary files and, if lucky, the encryption might be feasible to implement as a run-time loadable extension, which is a big deal for many users (or so I have heard).

I'm busy this month, but in the beginning of April I have a week or two to spend on this topic.

Interesting stuff.

For the journal files this is quite different: there are file headers and frame headers (depending on the journal type) in addition to the actual page content, and only the page content is encrypted. To make things worse, the headers contain checksums over the page content (currently the encrypted content!). Detecting page content reads and writes will require to access the headers. Not impossible, but probably messy.

Journals and WAL files are temporary and usually quite small, so encrypting them fully using a single key may be good enough until we figure out something clever (that is, more efficient and more secure). The idea is to not care about headers, frames, pages or checksums at all, and simply encrypt and decrypt everything "transparently" inside the VFS shim. In other words, SQLite3 sees only unencrypted content so checksums will just work.

However, actual databases must be encrypted at a page level to support fast incremental updates etc.

The good news is that a pointer to the SQLite database connection handle (sqlite3*) is passed into VFS implementations. That is, locating the pager, determining page size and so on is possible without major programming effort. However, this feature is also undocumented, but hopefully unlikely to vanish.

A more portable method is to store the page size when SQLite3 reads the database header from the disk. I'll experiment with this approach because I also want to try faking "reserved bytes per page" header field (for new encrypted databases) in order to force SQLite3 reserve 32 bytes for MAC/nonce at the end of each database page. This does not necessarily work, but it's worth trying.

In principle, nothing prevents us breaking the page alignment by storing MACs/nonces between the database pages on disk. But that just feels dirty and wrong...

On the down side is that the KEY parameter of the ATTACH command will no longer work in future versions of SQLite (The keyword is still parsed, but the value is then simply ignored).

Easy workaround is to use URI filenames to pass the key for attached database.

It is quite likely we end up breaking backwards compatibility if switching to VFS-based encryption scheme. I have been lately designing a new non-backwards-compatible cryptosystem for sqleet v1, which you might find interesting. See issue #35 for details.

utelle commented 4 years ago

I agree with your thoughts about VFS-based encryption. VFS encryption is the way forward, but it will require a lot of work to get all the details right (if it is even possible without crazy hacks).

It's certainly a major effort. However, now after some digging in the SQLite code I'm rather confident that it will not require unreasonably more hacks than the current approach.

The potential benefits are huge because VFS-interface gives us full control of temporary files

Correct. It should be possible to apply encryption also to file based temporary databases, if necessary.

and, if lucky, the encryption might be feasible to implement as a run-time loadable extension, which is a big deal for many users (or so I have heard).

IMHO it is unlikely that VFS-based encryption can be implemented as a runtime loadable extension. Loadable extensions have only access to the public SQLite API, but access to internal structures and use of certain internal functions is required. This could only be avoided by duplicating internal SQLite code, but this would make maintenance much more difficult.

I'm busy this month, but in the beginning of April I have a week or two to spend on this topic.

Due to personal circumstances my capacity is limited, too. However, I hope to have a first (partial) implementation ready by the end of March.

Journals and WAL files are temporary and usually quite small, so encrypting them fully using a single key may be good enough until we figure out something clever (that is, more efficient and more secure).

SQLite accesses journal and WAL files by separately (and sometimes only partially) reading and writing frame headers and actual page content. This makes applying encryption to the whole file difficult.

The idea is to not care about headers, frames, pages or checksums at all, and simply encrypt and decrypt everything "transparently" inside the VFS shim. In other words, SQLite3 sees only unencrypted content so checksums will just work.

This is true only for SQLite itself. For separate encryption agnostic test applications or tools such journal or WAL files would be complete garbage (or at least checksums would be wrong).

However, actual databases must be encrypted at a page level to support fast incremental updates etc.

AFAICT actual database files can be handled in almost the same way as with the current approach.

The good news is that a pointer to the SQLite database connection handle (sqlite3*) is passed into VFS implementations. That is, locating the pager, determining page size and so on is possible without major programming effort. However, this feature is also undocumented, but hopefully unlikely to vanish.

A more portable method is to store the page size when SQLite3 reads the database header from the disk.

Well, that's what SQLite does with the unencrypted bytes 16 to 23 in the database header: these bytes hold the information about page size and reserved bytes per page.

I'll experiment with this approach because I also want to try faking "reserved bytes per page" header field (for new encrypted databases) in order to force SQLite3 reserve 32 bytes for MAC/nonce at the end of each database page. This does not necessarily work, but it's worth trying.

I suspect that this is not possible with the current approach (if you don't follow the SQLite way of unencrypted bytes 16 to 23 in the database header), but with the VFS approach the first read of the database header could be detected and a template header could be presented to SQLite in case of a new empty database file. However, for existing databases the first read of the database header takes place, before you have the chance to set up the encryption mechanism - unless you force the users to use the key parameter in the file URI (IMHO not a good idea, because the key value would then be stored in memory for the lifetime of the database connection).

In principle, nothing prevents us breaking the page alignment by storing MACs/nonces between the database pages on disk. But that just feels dirty and wrong...

Yes, indeed.

On the down side is that the KEY parameter of the ATTACH command will no longer work in future versions of SQLite (The keyword is still parsed, but the value is then simply ignored).

Easy workaround is to use URI filenames to pass the key for attached database.

Nevertheless this requires the user to change his SQL code. And as mentioned above I don't think that using the key parameter in the URI is a good idea in respect to security. The alternative is to use a pragma command for specifying the key.

It is quite likely we end up breaking backwards compatibility if switching to VFS-based encryption scheme.

Well, I certainly intend to keep backward compatibility, and I'll try my best to accomplish that.

I have been lately designing a new non-backwards-compatible cryptosystem for sqleet v1, which you might find interesting. See issue #35 for details.

I will take a closer look later on.

resilar commented 4 years ago

I found something interesting: CEVFS - SQLite3 Compression Encryption VFS. The encryption seems to lack authentication (MACs), and combining compression with encryption is generally a bad idea. Nonetheless, the approach seems to be compressing and encrypting database pages individually, and mapping reads & writes to the custom compressed and encrypted page format using a VFS shim. This roughly corresponds to a VFS-based encryption with MACs/nonces stored between (unaligned) pages.

The idea is to not care about headers, frames, pages or checksums at all, and simply encrypt and decrypt everything "transparently" inside the VFS shim. In other words, SQLite3 sees only unencrypted content so checksums will just work.

This is true only for SQLite itself. For separate encryption agnostic test applications or tools such journal or WAL files would be complete garbage (or at least checksums would be wrong).

Yes. I do not think that matters to most users, though.

SQLite accesses journal and WAL files by separately (and sometimes only partially) reading and writing frame headers and actual page content. This makes applying encryption to the whole file difficult.

Encryption is not that difficult. For example, ChaCha20, AES-CTR & AES-GCM ciphers support random accesses (reads/writes) in O(1) time (whereas AES-CBC requires O(n) time). However, maintaining MACs and avoiding stream cipher attacks with ChaCha20 is annoying. Each write basically requires recomputing MACs for all pages affected by the write. If using ChaCha20, then affected pages must be fully re-encrypted with a newly generated key (or nonce).

utelle commented 4 years ago

I found something interesting: CEVFS - SQLite3 Compression Encryption VFS. The encryption seems to lack authentication (MACs), and combining compression with encryption is generally a bad idea.

I've seen that project some time ago. The author intended to create an alternative to SQLite's CEROD and ZIPVFS extensions. However, I don't think this implementation can serve as a basis for our encryption extensions - not only due to the facts you already mentioned, but also, because the encryption key is bound to the VFS itself. That is, all databases are encrypted with the same key unless you instantiate individual VFSs for each database connection.

Nonetheless, the approach seems to be compressing and encrypting database pages individually, and mapping reads & writes to the custom compressed and encrypted page format using a VFS shim. This roughly corresponds to a VFS-based encryption with MACs/nonces stored between (unaligned) pages.

Yes, but IMHO this makes file access rather inefficient.

... For separate encryption agnostic test applications or tools such journal or WAL files would be complete garbage (or at least checksums would be wrong).

Yes. I do not think that matters to most users, though.

Most likely, you are right. And if necessary one could build tools which support the encryption extension.

SQLite accesses journal and WAL files by separately (and sometimes only partially) reading and writing frame headers and actual page content. This makes applying encryption to the whole file difficult.

Encryption is not that difficult. For example, ChaCha20, AES-CTR & AES-GCM ciphers support random accesses (reads/writes) in O(1) time (whereas AES-CBC requires O(n) time). However, maintaining MACs and avoiding stream cipher attacks with ChaCha20 is annoying. Each write basically requires recomputing MACs for all pages affected by the write. If using ChaCha20, then affected pages must be fully re-encrypted with a newly generated key (or nonce).

IMHO it will be better to just encrypt the actual page content (as is done now) and keep away from file and frame headers.

utelle commented 4 years ago

I managed to implement a first preliminary version of a VFS with encryption support. That is, I was able to create an encrypted database that could be used by the previous implementation of the encryption extension and vice versa.

This version still lacks support for encryption of journal files. That will be the next task, unfortunately not a trivial task, because determining the page number is rather difficult.

resilar commented 4 years ago

Great work! I'm still busy with other less interesting things.

I'd seriously consider a "dumb" authenticated encryption for non-database files. Journal and WAL writes are (almost?) always appended to the end of the file, so maybe authenticating & encrypting them with a single key/nonce and (incremental) MAC would be sufficient in practice. Alternatively process the files in chunks of page_size bytes and handle them like regular DB pages - with special handling for the last page if the file size is not divisible by page_size. You are at least going to need that kind of generic encryption as a fallback for unrecognized (new) file formats written by SQLite3 and loadable extensions.

Special support for journal and WAL file formats is something I would do last, if at all.

utelle commented 4 years ago

I'd seriously consider a "dumb" authenticated encryption for non-database files.

IMHO the SQLite VFS API is not primarily meant to be used for actual non-database files. It is meant to handle database files and journal files. Of course, the VFS API could be used by extensions for non-database file handling. However, in such a case this extension should either not use the encryption VFS shim or it should define specific open flags to distinguish its own file formats from ordinary database and journal files. For unknown file types the encryption VFS shim will simply pass through the calls to the underlying real VFS.

Journal and WAL writes are (almost?) always appended to the end of the file, so maybe authenticating & encrypting them with a single key/nonce and (incremental) MAC would be sufficient in practice.

I don't believe that that will be a good efficient strategy. I'm not sure about the rollback journal, but frames of the WAL journal are at least updated in random order, and frames are updated only partially (e.g. recalculated checksums). That is, one would have to read the full frame, decrypt it, update it (partially), encrypt it, and write the full frame back to the file.

Alternatively process the files in chunks of page_size bytes and handle them like regular DB pages - with special handling for the last page if the file size is not divisible by page_size.

That would make access to journal files quite inefficient (as described above).

You are at least going to need that kind of generic encryption as a fallback for unrecognized (new) file formats written by SQLite3 and loadable extensions.

Well, in fact I don't intend to implement a generic encryption feature. Up until today the SQLite encryption API only handles database page content (be it in database or journal files). Files accessed by (loadable) extensions internally are not affected by the encryption at all. And I will keep it that way.

The only issue I see is the use case of nested VFSs. In my preliminary implementation I instantiate the encryption VFS by connecting it as a shim to the default VFS, and then the encryption VFS is made the new default. However, users might want to inject their own VFS somewhere in the chain.

In practice the user can always choose to use a different VFS for a main database or attached database.

That is, the question to be answered is whether the new encryption extension should be initialized automatically in a certain specific way (as is done in the current preliminary implementation) or whether the user/application is responsible of initializing it in a meaningful way.

Special support for journal and WAL file formats is something I would do last, if at all.

In fact, I chose a rather "dumb" approach for rollback and WAL journals: database page content is identified by the size of the chunk read or written, and measures are taken to identify the correct page number. It seems to work properly, but needs intensive further testing, of course.

It will take some more time to clean up the code a bit, but I intend to make the new code available on Github by the end of March or early in April.

utelle commented 4 years ago

Although I published a first preliminary implementation of VFS-level encryption (SQLite3 Multiple Ciphers), I have to admit that I got stuck a bit.

Since the new implementation should of course work in all scenarios, I started to test in a multi-threaded environment with shared cache mode. Here things really get nasty. At the moment I regularly get error messages due to HMAC mismatches. However, the reason is not that the HMAC is actually corrupt, instead the wrong codec gets selected.

Why and how can this happen? I had to dig in the SQLite code to find out what is going on. My current implementation is based on the assumption that for each database file in a database connection, sqlite3*, a file handle sqlite3_file* will be instantiated. In shared cache mode this is not the case. If a database file is used in several database connections in shared cache mode the underlying file will be opened only once. However, the file handle is notified (by passing down the connection handle sqlite3*) when a new connection starts using it, but not when it stops using it.

In the published version of the code each file handle remembers only the last connection handle. As a consequence it could happen that the correct codec couldn't be found on initializing the encryption (the associated file is identified by the connection handle). In my yet unpublished code version the file handle remembers all connection handles. But this doesn't solve the problem, because stale connection handles are also on the list.

The solution will be to rewrite the code in such a way that on initializing encryption not the connection handle but the filename is used to identify the associated file handle. However, thinking about this approach made me aware of another problem:

Up to now a codec was attached to a database connection handle sqlite3*. So each connection handle had its own codec instance, even if the same database file was accessed. In the new implementation the codec is attached to a file handle sqlite3_file*, because the connection handle sqlite3* has no means anymore to attach any user/application specific information to it.

Imagine 2 database connections in shared cache mode accessing the same database file, that is, there is only one file handle. Now imagine the following actions: connection 1 establishes encryption with the correct key; thereafter, connection 2 establishes encryption, but uses accidentally a wrong key. The wrong codec of connection 2 replaces the correct codec of connection 1, because there is only one codec instance attached to the file handle. As a result, connection 1 will not be able to access the database anymore. :boom:

One solution could be the following: if connection 2 tries to attach a codec, one could verify that the codec is identical to the already attached one, and to refuse to attach it if it doesn't match. The drawback is that removing a codec - for example, after a successful rekeying - wouldn't work either. Maybe a rekey operation should be prohibited in shared cache mode.

Any ideas, how to go forward?