utelle / SQLite3MultipleCiphers

SQLite3 encryption extension with support for multiple ciphers
https://utelle.github.io/SQLite3MultipleCiphers/
MIT License
420 stars 77 forks source link

Document the cipher descriptor and the parameters to each function? #177

Closed ethindp closed 1 month ago

ethindp commented 1 month ago

Would it be possible to more thoroughly document the cipher descriptor? Right now it defines the functions but I have no idea what each of the arguments mean or when they are important. I can make some inferences, like the cipher pointer pointing to the actually allocated cipher struct, but a lot of the other arguments (like rekey for example) are non-obvious. Looking at the cipher implementations isn't all that useful -- it only makes me think that I have to call a bunch of in theory undocumented functions to actually do things. As another example, the encrypt/decrypt functions take a len and reserved argument, but is this the length of the page and the number of bytes reserved that I should subtract from len? Where are the reserved bytes at? Things like that. I know that the SQLite documentation can somewhat fill in here but not entirely.

utelle commented 1 month ago

Would it be possible to more thoroughly document the cipher descriptor?

Possible of course. However, up to now this has not happened, because presumably more than 99.9 % of the SQLite3 Multiple Ciphers users are not interested in this feature.

The option to dynamically register cipher schemes was added about 2 years ago, after discussions with a developer in need of bringing his own encryption scheme. Since then no one else asked any questions about this feature.

The documentation section Dynamic Cipher Schemes holds some basic information, but it certainly could be more elaborate.

Right now it defines the functions but I have no idea what each of the arguments mean or when they are important. I can make some inferences, like the cipher pointer pointing to the actually allocated cipher struct,

Correct. And the size and layout of this structure is defined by the cipher implementation.

but a lot of the other arguments (like rekey for example) are non-obvious.

Agreed.

Looking at the cipher implementations isn't all that useful -- it only makes me think that I have to call a bunch of in theory undocumented functions to actually do things.

The latter is usually not the case. A cipher implementation has to provide those functions, which are called by a higher level of the encryption scheme implementation. Typically, you don't call them yourself.

As another example, the encrypt/decrypt functions take a len and reserved argument, but is this the length of the page and the number of bytes reserved that I should subtract from len?

Using len as the parameter name was certainly not my brightest idea. And yes, it is the page size.

Where are the reserved bytes at? Things like that.

Reserved bytes are always located at the end of the page buffer. Usually, SQLite "knows" the number of reserved bytes from information in the database header. However, some legacy cipher schemes encrypt the database header in an incompatible way.

I can try to fill the gaps, but this will take some time, because my time resources are limited.

I know that the SQLite documentation can somewhat fill in here but not entirely.

To a certain degree the publicly available documentation about SEE provides information, but because SEE is a commercial product, there are not many details about the inner workings.

ethindp commented 1 month ago

@utelle I mean, I really only need some things documented I think. Like I have the following questions about the functions:

Right now I just ignore all of these parameters that I'm not certain about. I don't plan on supporting legacy ciphers or anything, so that isn't an issue.

My last two questions are:

  1. If I want to omit all the built-in ciphers, how do I prevent it from printing a warning (since I'm going to register my own), and is the CODEC_TYPE define actually important? How does it affect the encryption/decryption operations?
  2. Is it possible for me to support in-memory decryption only? As in, can I attach an (encrypted) database onto the end of an executable, maybe use a VFS to tell it where the DB is at, and then make SQLite think that it's a file when it really isn't? Could I still get decryption at least?

This is really all I need to know I think, other than what you've already provided. Though I might have more questions in the future. I understand your time is limited but any assistance is appreciated.

utelle commented 1 month ago

I mean, I really only need some things documented I think.

You can always ask questions here (in GitHub issues) or via mail. Usually I try to respond in a timely manner.

Like I have the following questions about the functions:

  • What is the purpose of the GetSalt_t function? Right now I just return nullptr.

A cipher scheme can decide to use salt for the key derivation. The salt is an array of 16 bytes, which is stored in the first 16 bytes of the SQLite database header.

Typically, random bytes are used. Sometimes it is necessary to make the salt available to the application. For example, if you want to use salt, but can't encrypt at least part of the header - on some platforms the first 16 bytes must contain the SQLite identifier "SQLite format 3\0". Function GetSalt_t returns a pointer to the salt bytes. If you don't use salt, you can and should return nullptr. The salt can be queried with a SQL function, of which the implementation calls GetSalt_t.

  • In the GenerateKey_t function, what do the pBt, rekey, and cipherSalt arguments do/what are they for? Do I read or write to them (or both)?
  • In DecryptPage_t, what is the hmacCheck parameter? What do I do if it's true? Or am I supposed to ignore it?

Right now I just ignore all of these parameters that I'm not certain about.

Whether you need to take these parameters into account, depends on your cipher scheme.

I don't plan on supporting legacy ciphers or anything, so that isn't an issue.

For new cipher schemes it is strongly recommended to leave bytes 16 to bytes 23 of the database header unencrypted - as is the case for the official SEE implementation. SQlite reads these bytes before the encryption scheme is initialized to determine the page size and the number of reserved bytes.

My last two questions are:

  1. If I want to omit all the built-in ciphers, how do I prevent it from printing a warning (since I'm going to register my own),

I guess you mean the compiler warning that no built-in cipher is enabled. Well, currently there are no means to disable it.

However, you are free to remove it from the code. To register your own cipher you will have to modify the code of function sqlite3mc_initialize.

and is the CODEC_TYPE define actually important?

Yes, it holds the default codec type. If it is not set, it will be set to CODEC_TYPE_CHACHA20.

It could be that adjustments to the source code need to be done. Unfortunately, I never got feedback from the developer who wanted to implement his own cipher scheme.

How does it affect the encryption/decryption operations?

It will determine the default cipher scheme, if it was not selected via PRAGMA cipher or via URI parameter.

  1. Is it possible for me to support in-memory decryption only?

No. SQLite3 Multiple Ciphers has the same restrictions as SEE: in-memory databases and temporary databases will not be encrypted.

As in, can I attach an (encrypted) database onto the end of an executable, maybe use a VFS to tell it where the DB is at, and then make SQLite think that it's a file when it really isn't? Could I still get decryption at least?

In principle, you could use the VFS shim apndvfs (implemented in ext/misc/appendvfs.c of the SQLite source distribution). The initialization function of that extension registers the VFS shim for the default VFS. And thereafter you could register the encryption extension on top of the VFS apndvfs. (I have to admit that I haven't tested this, but it should work.) However, if you append a database to an executable, you must open the database in read-only mode.

This is really all I need to know I think, other than what you've already provided. Though I might have more questions in the future. I understand your time is limited but any assistance is appreciated.

Usually, I try to answer questions timely.

ethindp commented 1 month ago

@utelle

  • rekey: this is a boolean (taking values 0 or 1). If it is true (1) the key salt should be renewed (generate random bytes). Otherwise new key salt is only generated, if the database is new and empty.

So if I understand this right, I should generate a new salt if rekey is true, otherwise leave it alone? And if I do generate a new salt, I store it in cipherSalt, and use cipherSalt as the salt for key generation, regardless of the value of rekey?

  • cipherSalt: this is an array of 16 bytes. For an existing non-empty database this can be used to overwrite the salt from the database header.

Waaaait, so I can use a different salt every time I open a database, and that will trigger a call to generate_key? (I feel like that might make the call recursive but...)

utelle commented 1 month ago

So if I understand this right, I should generate a new salt if rekey is true, otherwise leave it alone?

Yes, if your cipher scheme uses a key salt, that is. If your cipher does not make use of salt, then you may ignore the parameters rekey and cipherSalt.

And if I do generate a new salt, I store it in cipherSalt,

No. You store it in the contect struct of your cipher implementation. And then you have to put it into the first 16 bytes of the database header located on database page 1. Please look at the code of the cipher implementations.

and use cipherSalt as the salt for key generation, regardless of the value of rekey?

Certainly not. Only if rekey equals 0.

  • cipherSalt: this is an array of 16 bytes. For an existing non-empty database this can be used to overwrite the salt from the database header.

Waaaait, so I can use a different salt every time I open a database,

Of course not. There are 2 situations where you need to specify the salt explicitly:

1) The database header was corrupted. 2) The database header is unencrypted.

In both cases you have to provide the salt with which the database was created.

and that will trigger a call to generate_key? (I feel like that might make the call recursive but...)

No. GenerateKey is called after the database was opened and the passphrase was provided, either via PRAGMA key or via URI parameter. If the URI parameters include the parameter cipher_salt, then its value is passed on to GenerateKey.

ethindp commented 1 month ago

Okay, last set of questions I think, thank you so much for yoru help:

In principle, you could use the VFS shim apndvfs (implemented in ext/misc/appendvfs.c of the SQLite source distribution). The initialization function of that extension registers the VFS shim for the default VFS. And thereafter you could register the encryption extension on top of the VFS apndvfs. (I have to admit that I haven't tested this, but it should work.) However, if you append a database to an executable, you must open the database in read-only mode.

If I do use a VFS for reading the appended database, how would I enable the decryption support? I would never allow read-write access for such a thing since it could corrupt the executable, but if I did append the database onto the end, I would need some way of reading it back. Do I automatically get decryption with VFSs, or do I need to do something special to enable that, if I were to try this?

utelle commented 1 month ago

If I do use a VFS for reading the appended database, how would I enable the decryption support?

On opening the database you will have to specify the VFS to be used. Assuming that you already registered the "appendvfs" then you specify the VFS as multipleciphers-apndvfs. This will create the VFS if it doesn't already exist.

I would never allow read-write access for such a thing since it could corrupt the executable, but if I did append the database onto the end, I would need some way of reading it back. Do I automatically get decryption with VFSs, or do I need to do something special to enable that, if I were to try this?

You have to enable the "appendvfs" extension and then to specify the VFS to be used as described above.

ethindp commented 1 month ago

@utelle Oh okay, thank you! I appreciate all the help!