utelle / SQLite3MultipleCiphers

SQLite3 encryption extension with support for multiple ciphers
https://utelle.github.io/SQLite3MultipleCiphers/
MIT License
382 stars 73 forks source link

rekey results in invalid database #165

Closed rogerbinns closed 3 months ago

rogerbinns commented 3 months ago
import apsw

con = apsw.Connection("testdb")
con.pragma("cipher", "rc4")
con.pragma("key", "hello")
con.execute("create table x(y); insert into x values(randomblob(656536))")

con.pragma("cipher", "ascon128")
con.pragma("rekey", "world")
con.execute("insert into x values(randomblob(656536))")

When the final execute happens I get SQLError: attached databases must use the same text encoding as main database which it not right.

rogerbinns commented 3 months ago

The same problem happens going from aes128cbc to chacha20

rogerbinns commented 3 months ago

And various other combinations where other errors like NotADBError: file is not a database happen. I'm guesing the header is corrupt.

I've commited a script that does this random testing. Run it with python3 -m apsw.mcall

utelle commented 3 months ago

Typically, PRAGMA rekey should be used to change the passphrase only, not the cipher scheme.

The rekey function uses a modified vacuum procedure internally , if the cipher scheme should be changed at the same time. I have to analyze, what goes wrong for the various cipher combinations. It could well be that certain combinations don't work at all due to some SQLite restriction.

SQLCipher for example doesn't support PRAGMA rekey at all for encrypting a plain database or for decrypting an encrypted database. Only changing the passphrase is supported.

Of course, I will further investigate this issue, but it will not have highest priority. Just changing the passphrase should always work, and for changing the cipher scheme there is the alternative of vacuum into.

rogerbinns commented 3 months ago

SQLCipher for example doesn't support PRAGMA rekey at all for encrypting a plain database

I refer to the docs:

The PRAGMA rekey resp PRAGMA hexrekey statement has 3 use cases:

  1. Encrypt an existing unencrypted database
  2. Change the encryption key of an existing encrypted database
  3. Remove encryption from an existing encrypted database

Unless an error is returned, it means the action is supported! Things that are not supported must error, and I am fine with that.

For the record the vast majority of rekeying does work.

rogerbinns commented 3 months ago

You can run python3 -m apsw.mcall to run through randomized combinations. Here are some:

unencrypted to sqlcipher legacy=3 via rekey

SQLError: Rekeying failed. Pagesize cannot be changed for an encrypted database. (good response)

However memory is leaked.

unencrypted to sqlcipher via rekey

No error, database is ok

rc4 to sqlcipher via rekey

Rekey goes fine, accessing the database gives CorruptError (rekey should either refuse, or succeed and not corrupt)

rc4 to sqlcipher legacy=3 via rekey

Rekey gives SQLITE_ERROR without setting sqlite3_errmsg.

utelle commented 3 months ago

sqlcipher legacy=3 uses a page size of 1024 in contrast to the default page size of 4096. Not sure, why the change in page size is sometimes detected and sometimes not. I will check the code.

rogerbinns commented 3 months ago

I find the regular page_size differing from legacy_page_size confusing. If both are set, which takes priority? Why does legacy_page_size even exist?

Also although the documentation says legacy_page_size must be a power of 2,. the pragma happily accepts any values from 1 through 65536 inclusive, like 5432;.

BTW my original assumption was the cryptography was using its own pages within the SQLite pages - ie a cryptography block size that could be independent of the SQLite page size, which in turn is independent of the filesystem block size.

utelle commented 3 months ago

I find the regular page_size differing from legacy_page_size confusing.

Yes, it is confusing, although I tried hard to explain it in the documentation.

The problem is that the SQLite database header contains information about the page size of the database file. This information is read by SQLite before the encryption extension has a chance to initialize the required cipher scheme. Therefore the official SQLite Encryption Extension (SEE) leaves exactly 8 bytes of the database header unencrypted, so that the page size can determined, before initializing SEE. Typically a cipher scheme needs to know the page size to be able to locate the reserved bytes per page.

However, the original versions of SQLCipher, sqleet, and (very early) versions of wxSQLite3 encrypt the complete header. This prevents SQLite from determining the correct page size, so that typically the default page size (currently 4096) will be used. In most cases this works, because the database actually has the default page size, but for example prior versions of SQLCipher used a page size of 1024.

If both are set, which takes priority?

First, the legacy page size will be set, but it can be overwritten by a pragma page_size.

Why does legacy_page_size even exist?

Different legacy cipher schemes use different page sizes. Specifying the default legacy page size spares the user from explicitly issuing a pragma page_size. If the database does not use the default, using pragma page_size will be still required, unless the legacy page size was adjusted accordingly.

Also although the documentation says legacy_page_size must be a power of 2,. the pragma happily accepts any values from 1 through 65536 inclusive, like 5432;.

You are right. This should be corrected. SQLite's pragma page_size does not issue an error message, if a wrong page size was specified, but it will not change the value in that case.

BTW my original assumption was the cryptography was using its own pages within the SQLite pages - ie a cryptography block size that could be independent of the SQLite page size, which in turn is independent of the filesystem block size.

No, the actual page size must be a power of 2. Cipher schemes (or VFSes, to be more generic) are allowed to reserve up to 255 bytes per page for their own use (for example, for nonce, HMACs or other data). SQLite will then use page size - reserved bytes bytes of a page for storing database content.

rogerbinns commented 3 months ago

You've explained why the page size needs to be known before first access in many cipher configurations. But not why legacy_page_size exists. I don't see how having it spares you from a page_size pragma - if you need to set the page size in advance, why are there two different pragmas that have the same effect?

utelle commented 3 months ago

You've explained why the page size needs to be known before first access in many cipher configurations.

Only for legacy cipher schemes. Otherwise SQLite knows the page size from the database header.

But not why legacy_page_size exists.

It is required for setting the page size of a database which is encrypted with a legacy cipher scheme.

I don't see how having it spares you from a page_size pragma

Sorry, my explanation was unfortunately not correct regarding pragma page_size. pragma page_size can be used to set page size for a new database or to change the page size of an existing database. For databases encrypted with legacy cipher schemes setting the page size for an existing database is required. And that can be done with pragma legacy_page_size only.

The parameter legacy_page_size has a default value for each legacy cipher scheme. For example 1024 for SQLCipher up to version 3. Usually you don't have to explicitly set legacy_page_size, but maybe a project decided to use a different page size, say 16384. Then pragma legacy_page_size allows to change the default legacy page size.

if you need to set the page size in advance, why are there two different pragmas that have the same effect?

pragma page_size is handled by SQLite, pragma legacy_page_size is handled by SQLite3MC. And their effect is not the same. The main purpose of pragma page_size is to change the page size of a database (or to query the page size). pragma legacy_page_size sets the page size of an existing legacy database, so that SQLite then knows the correct page size.

Pragma statements can currently only be used to change the configuration of the currently selected cipher. However, the default configuration of any supported cipher scheme can be adjusted via SQL functions. This can be useful, if an application has to deal with many databases which are attached to a database connection.

utelle commented 3 months ago

Also although the documentation says legacy_page_size must be a power of 2,. the pragma happily accepts any values from 1 through 65536 inclusive, like 5432;.

This has been fixed in commit 56ac1e2f7a703efe2924ea0fa0691bbe03bb03f4. Now only valid page sizes are accepted. Otherwise the value is not changed.

utelle commented 3 months ago

Commit efdb69421800cdac6a7c6bfe9759f62287b2b5e6 should fix the issue.

If there are still cases for which rekeying results in corrupted databases, please reopen.

rogerbinns commented 3 months ago

Running python3 -m apsw.mcall I no longer see corrupt databases. I do still see #164 where rekey pragma returns SQLITE_ERROR and does not set the error string.

utelle commented 3 months ago

Running python3 -m apsw.mcall I no longer see corrupt databases.

Good.

I do still see #164 where rekey pragma returns SQLITE_ERROR and does not set the error string.

Yes, I haven't tracked down yet this issue.

Just a few minutes ago I tested on Linux. Here is the result I see on screen:

APSW debug build: missing sys.apsw_fault_inject_control
{'cipher': 'sqlcipher', 'hmac_use': 1, 'plaintext_header_size': 62, 'legacy': 2}
SQLError: Rekeying failed. Pagesize cannot be changed for an encrypted database.
{'cipher': 'rc4', 'legacy_page_size': 1024}
SQLError: Rekeying failed. Pagesize cannot be changed for an encrypted database.
{'cipher': 'rc4'}
SQLError: Rekeying failed. Pagesize cannot be changed for an encrypted database.
{'cipher': 'ascon128'}
{'cipher': 'aes128cbc'}
{'cipher': 'rc4'}
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ulrich/Development/GitHub/apsw-sqlite3mc/apsw/mcall.py", line 154, in <module>
    run()
  File "/home/ulrich/Development/GitHub/apsw-sqlite3mc/apsw/mcall.py", line 125, in run
    con.pragma("hexrekey", newkey)
  File "src/cursor.c", line 959, in APSWCursor_execute.sqlite3_prepare_v3
    AddTraceBackHere(__FILE__, __LINE__, "APSWCursor_execute.sqlite3_prepare_v3", "{s: O, s: O}",
apsw.SQLError: SQLError: SQL logic error

Obviously, there still is at least one bug somewhere, but at the moment I don't know why there is a crash.

The second run produced the following error:

python3: /home/ulrich/Development/GitHub/apsw-sqlite3mc/sqlite3/sqlite3.c:60597: pagerOpenWalIfPresent: Assertion `pPager->eState==PAGER_OPEN' failed.

No idea, why SQLite tries to open a WAL pager.

rogerbinns commented 3 months ago

Found another corrupt database, which also causes an assertion failure in debug build

import apsw

con = apsw.Connection("testdb")
con.execute("create table x(y); insert into x values(randomblob(65536))")
con.pragma("cipher", "sqlcipher")
con.pragma("plaintext_header_size", 33)
con.pragma("fast_kdf_iter", 63)
con.pragma("hmac_algorithm", 1)
con.pragma("hexrekey", "aabbccdd")
con.execute("insert into x select * from x")

Note that plaintext_header_size is documented as being a multiple of 32 but any value is accepted. Setting it to 32 makes no difference in this failure.

rogerbinns commented 3 months ago

The second run produced the following error:

You can do the running having gdb automatically break on those assertion failures:

gdb -ex=run --args python3 -m apsw.mcall
utelle commented 3 months ago

Found another corrupt database, which also causes an assertion failure in debug build

import apsw

con = apsw.Connection("testdb")
con.execute("create table x(y); insert into x values(randomblob(65536))")
con.pragma("cipher", "sqlcipher")
con.pragma("plaintext_header_size", 33)
con.pragma("fast_kdf_iter", 63)
con.pragma("hmac_algorithm", 1)
con.pragma("hexrekey", "aabbccdd")
con.execute("insert into x select * from x")

Note that plaintext_header_size is documented as being a multiple of 32 but any value is accepted. Setting it to 32 makes no difference in this failure.

Yes, just as for legacy_page_size it will be necessary to check that the values for plaintext_header_size take valid values only. SQLite3 Multiple Ciphers will happily accept any value, because the underlying AES algorithm uses Cypher Text Stealing if the buffer length is not a multiple of 16. However, the resulting database will no longer be compatible with the original SQLCipher implementation.

Thanks for the sample. This will help to analyze what is going wrong and where.

For the situation where the SQL logic error is thrown I have found out that the rekey operation fails with an error code (but without a message from rekey). The text SQL logic error is the default text for error code SQLITE_ERROR.

I'm quite confident that I will manage to resolve this issue within the next couple of days.

Thanks for the hint how to use gdb to catch assertions.

utelle commented 3 months ago

I finally found the cause for the pager assertion. The HMAC size for HMAC algorithm SHA256 of SQLCipher was reported incorrectly. Thus, the HMAC was written partially outside of the page buffer bounds.

Commit 02b69ad7418a59530253d1ed5061a94b0016ca49 fixes this. At least in my development environment, the test sample mcall.py does no longer throw exceptions.