Closed tasket closed 3 years ago
I'd also include in requirements that it should not trust any single template (including VMs based on a specific template) with cleartext. While template compromise is unlikely and fatal already (for data of VMs based on it), spreading impact to all the VMs is even worse. This means encryption should be done either in dom0, or some entity (unikernel-like?) used only for backups encryption and nothing else. Or use different VMs depending on what data is encrypted (probably too complex).
Thanks, Marek. That is what I was alluding to in references to isolation potential and low interactivity (as well as in the Readme where it states that untrusted guest volumes are handled safely), but its best to make it explicit.
I had some discussion with the author of CryFS about this issue, backing up from an isolated admin VM, but s/he didn't seem to appreciate why anyone would isolate encryption functions in a disconnected admin environment.
Changing the milestone to v0.4 as that will be the version that gets experimental encryption support.
I am not sure if this is same thing but i have offered a $3500 bounty for per vm encryption in Qubes. I am also open to upfront payment depending on conditions. I apologize if this comment is posted off topic.
This is irrelevant here, but would love to know where that bounty was issued. I know someone interested in resolving that issue @cm157. Mind to share a link?
The context of wyng-backup here is to encrypt backup, not qubesos AppVM memory nor LVM.
@cm157 Are we talking disk, ramdisk or Instruction Set Random ("encrypted")? XOR "encrypted" with Crypto Seed thrown away once Decoder is running. Could use the money
Hi i am talking disk encryption so that when a vm is not running its data is in secure state so if computer is attacked in a state that the FDE is unlocked then only data from VMs that are running or user has decided not to encrypt are exposed.
The idea and motivation is simple my belief is only the vm that is running should be in unsafe state. When VMs not running data should be encrypted.
The reason for this is to solve limitation with full disk encryption in that is only good when device is fully shutdown. its benefit is extremely limited in scope to that scenario only and imho naive to assume if you are subject to targeted physical attack that you will have opportunity to completely shutdown your computer.
Another use case it also means if i am traveling and risk that border police want to image my device sometimes its not ideal to factory erase everything on computer first its inconvenient and second its very suspicious to do. Its better to let them see what i want and if they want image that its fine. But my sensitive vms have been deleted and since encrypted secret key deleted etc even if they image it that data can not be recovered. Make life easier having that peace of mind.
Anyway i dont get distracted too much the bounty requirement is simple per vm encryption in qubes .. plenty of use cases and reasons for it will vary person to person but that is some insight into my motivations
Discussion should move back to QubesOS/qubes-issues#1293
Discussion should move back to QubesOS/qubes-issues#1293
Sorry. 😬. I got carried away.
Locking for now due to extraneous noise.
Some interesting AES encryption modes (subject to change):
OCB is said to be much faster than other authenticating modes.
IMHO, its uncertain whether an authenticating mode is necessary here for a couple of reasons: 1) Wyng already has a basis (hash manifests) for validating chunks of data. 2) The "hash last, then validate hash first" advocates appear to be basing their argument on attacks that mainly work on network data streams. If that is the case, then there is less to be concerned about in selecting between these modes.
It is also worth assessing the risk of decrypting an (initially) unvalidated ciphertext. My understanding is that a symmetric cipher like AES and popular hashing algorithms are closely related and fall under the class of finite state machines. Therefore, if the very next operation after decrypt() is always either hash + compare with manifest hash or discard, then I think this is safe and secure.
The largest issue in selecting a mode is probably in the degree of uniqueness required for the IV/nonce, which will be something to consider going forward.
IANAC – This is all open to debate so convince me otherwise. :)
Encryption work has started in branch wip04, at a POC stage, and as yet cannot be used securely (!) as it leaves the key exposed.... Use only test volumes with this. The metadata will be saved under a separate 'wyng.backup040' dir instead of usual, so no need to change meta dir manually for tests.
Encryption is enabled by default and uses AES-256-CBC mode cipher. Currently it encrypts+decrypts data only (not metadata). To this extent "it works" for send
and receive
.
After testing various cipher modes, I settled on CBC. It is quite secure ("catastrophic" failure of confidentiality is rare/limited) and since the Python crypto libraries don't appear to be parallelized, its one of the better performers too. SIV mode had less than half the throughput of CBC in my tests.
I also want to note that I took a fairly harmless liberty using encrypt() in the hope that collision resistance would be improved: A 128bit random "bolster" is added to the beginning of each plaintext chunk just before encrypting. With a chaining or cascading mode like CBC, I expect this should protect the actual data better than just the IV alone. Of course, informed comments on any of this are very welcome.
New branch 'wip04b' created with the crypto library switched from pycryptodome to cryptography (python3-cryptography package).
The reason is that cryptography is benchmarking around 35% faster for the same AES-256-CBC cipher, meaning that pycryptodome is exacting a 50% performance penalty. This was too large to ignore so I decided to switch now before the code got too dependent on the slower library. Another plus is that cryptography has been available in OS repositories longer and more consistently.
Another small change is that encryption happens only just prior to the data buffer being sent, instead of being encrypted and possibly not being sent bc deduplication.
Also, MAC tests for receive/verify/diff are now done with the secrets library.
Some changes that are needed next:
Once these are implemented, we should have a reasonably secure encryption scheme for Wyng archives.
Encryption is enabled by default and uses AES-256-CBC mode cipher. Currently it encrypts+decrypts data only (not metadata). To this extent "it works" for
send
andreceive
.
AES-256-CBC is a poor choice. Not only is it very slow when encrypting (far more common than decrypting), it does not provide authentication, which leaves it vulnerable to chosen-ciphertext attacks. An AEAD cipher such as AES-256-GCM or ChaCha20-Poly1305 is a far better choice. AES is only a reasonable option on platforms where it is hardware accelerated; if portability to other platforms matters, use ChaCha20-Poly1305.
AEAD ciphers usually have short nonces, which must never repeat for a given key. These nonces are too short to be safely chosen at random. Since persistently storing a nonce is a recipe for disaster (consider qvm-volume revert
), a fresh key must be generated whenever the process starts. The key can be generated from a master key and a long random nonce using any decent KDF. The nonce must be stored along with the data.
Finally, AEAD APIs should require the entire buffer to be passed in one operation. This makes them unsuitable for encrypting large individual messages. Instead, a streaming API should be used. Manually implementing such an API is error-prone.
Is https://download.libsodium.org/doc/secret-key_cryptography/secretstream an option? That provides a high-level API for encrypting a sequence of messages, which is what a backup system needs. I strongly recommend just using libsodium for this, rather than trying to implement something similar by hand.
@DemiMarie
Wyng is already awash in data hashes which puts it in the position of using CBC mode to its advantage. GCM is based on CTR mode, which has the potential for catastrophic confidentiality failures not present in CBC. Also see my comments above where the python library CBC was found to be about as fast as GCM.
If you think Wyng's data verification is an issue, that should be addressed separately as it exists already and independently from encryption––and that will not change without compelling arguments.
OTOH, that is not to say an AEAD mode won't be added. I think SIV (rather slow) or GCM-SIV (faster, but currently unavailable from OS repository) would have acceptable confidentiality safeguards. But GCM confidentiality appears too weak on its own (and hence why GCM-SIV was developed). XChaCha20-Poly1305 (X- with beefed-up IV space) also looks interesting, although I'd prefer to see some discussion about non-stream applications and also a formalization of the ChaCha20/XChaCha20 protocol first. Note these recent developments (GCM-SIV and XChaCha20) indicate confidentiality weakness of prior modes.
The current emphasis on AEADs is controversial. Probably, if Wyng were a network protocol and not an at-rest storage format, I would agree AEADs are compelling. But that is not the case here.
Since persistently storing a nonce is a recipe for disaster (consider qvm-volume revert), a fresh key must be generated whenever the process starts. The key can be generated from a master key and a long random nonce using any decent KDF. The nonce must be stored along with the data.
My reading of current practice is that (besides nonce storage being accepted and usually mandatory) new keys are generated when nonce/IV space is exhausted. A particular mode may also have a requirement that an IV is unpredictable. So I think its more likely Wyng could use a nonce/IV that combines a unique counter with a random portion. A 64bit counter would accommodate (w smallest chunk size) 2^64 * 65536 (yottabytes) of backed-up disk space.
Is https://download.libsodium.org/doc/secret-key_cryptography/secretstream an option?
It depends. An incremental backup session records a series of data chunks to the archive. If the whole series must be considered a single stream, then Wyng loses both pruning capability and deduplication. So each chunk would need to be its own stream.... and I think we're back to the issues we face with the AES modes. That's why I stated early-on that the threat model looks more like the one for whole-disk encryption, which has its own trade-offs. If we break from that threat model we're probably looking at limiting the storage model to something that is not random-access.
Metadata is a bit different story. Because of the way Wyng processes it (funneling digest lists through merge-sort), other encryption modes can be used.
My reading of current practice is that (besides nonce storage being accepted and usually mandatory) new keys are generated when nonce/IV space is exhausted. A particular mode may also have a requirement that an IV is unpredictable. So I think its more likely Wyng could use a nonce/IV that combines a unique counter with a random portion. A 64bit counter would accommodate (w smallest chunk size) 2^64 * 65536 (yottabytes) of backed-up disk space.
One needs to store the nonce with the data, but one must never use a nonce that was read from disk or the network. Instead, one should generate a fresh key using a KDF whenever the process starts, and store the KDF inputs (except the secret seed) along with the data. Alternatively, one can use XChaCha20-Poly1305, which has a large enough nonce that it can be just generated at random at each startup.
It depends. An incremental backup session records a series of data chunks to the archive. If the whole series must be considered a single stream, then Wyng loses both pruning capability and deduplication. So each chunk would need to be its own stream.... and I think we're back to the issues we face with the AES modes. That's why I stated early-on that the threat model looks more like the one for whole-disk encryption, which has its own trade-offs. If we break from that threat model we're probably looking at limiting the storage model to something that is not random-access.
The whole-disk encryption threat model is intended for protection against loss of physical media, where chosen-ciphertext attacks are very difficult. That is not the case here.
but one must never use a nonce that was read from disk or the network
Re -use... correct?
The whole-disk encryption threat model is intended for protection against loss of physical media, where chosen-ciphertext attacks are very difficult. That is not the case here.
This is going a little far. FDE is deployed on network storage systems, and in office environments where repeated physical access (without losing media) is a part of the threat model.
I'm also curious why a chosen-ciphertext attack against data is an issue here. Wyng always works from the assumption that its metadata (digest list) is secure before any data is verified. Hence the 3rd item on the above checklist.
And I'll grant the threat model is not exactly like FDE, where some implementations will re-use IVs. That's why I proposed using a unique counter in the IV.
Note: I wrote this comment after reading up to https://github.com/tasket/wyng-backup/issues/7#issuecomment-866171231. So this text doesn't take into consideration what has been posted after that.
I second @DemiMarie's opinion that this should use authenticated encryption and if possible some existing more high-level API (I need to lookup some details on libsodium before I will comment if I think it's a good choice here (probably yes)).
You are right that this is probably harder to attack in practice than some network protocol but given how fast modern AEADs are there is no reason to build something fragile in a new thing.
So each chunk would need to be its own stream.... and I think we're back to the issues we face with the AES modes.
I don't understand this argument. What issue do you have if you make each chunk a separate stream? That being said if your chunks are small enough you can use the simpler AEAD interface instead of some streaming API.
That's why I stated early-on that the threat model looks more like the one for whole-disk encryption, which has its own trade-offs.
I think for a backup software you need to support a bit more than FDE. In particular non local storage needs stronger authentication requirements than local FDE (as it's currently in use). Qubes built-in backup also supports strong authentication. FDE is also slowly moving into authenticating things. For example for system software (no encryption, only authentication) there's dm-verity that provides strong authentication. dm-crypt+dm-intergrity can provide only sector level authentication, but still better than plain dm-crypt. In general FDE makes a lot of compromises due to it's requirements. For a backup software you are in a much better position so you can support better crypto.
OTOH, that is not to say an AEAD mode won't be added.
I would recommend against building in some separate crypto algorithm agility scheme for this use case. Choose a good algo+parameters. And if at some point it really turns out that there is a need to change it, that should be handled by a global format version change like other big changes.
Note these recent developments (GCM-SIV and XChaCha20) indicate confidentiality weakness of prior modes.
Those developments are mainly to support randomly generated IVs (and IIRC the SIV variants also target to have some nonce reuse resistance), I don't think "confidentiality weakness" is a good way to say that they should not be used with random IVs. If this relevant to your usage depends a lot on how you plan to use it (see below).
Some changes that are needed next:
- [ ] Proper key derivation and protection with passphrase
- [ ] Encryption of metadata
- [ ] Hierarchical validation of metadata (issue #79)
Key derivation and integration into hierarchical metadata is probably the much more tricky task (because here "just use a existing robust higher-level API" is probably not possible to the extend it's for the encrypt a chunk part). So I would suggest to first draft the plan for this and discuss that. Then you can take another look at the "how to encrypt a chunk" part (for example if you derive a new key for each chunk anyway IV re-use is a not an issue).
FDE is deployed on network storage systems, and in office environments where repeated physical access (without losing media) is a part of the threat model.
Such attacks are not addressed by common FDE solutions like dm-crypt (AFAIK MS's BitLocker is very similar but I'm not familiar with it's details).
I'm also curious why a chosen-ciphertext attack against data is an issue here. Wyng always works from the assumption that its metadata (digest list) is secure before any data is verified. Hence the 3rd item on the above checklist.
So you already verify the hash of the encrypted data before decryption? Then you have a custom AEAD scheme, not just CBC. I read your previous comments as you don't do this and only verify after decryption.
but one must never use a nonce that was read from disk or the network
Re -use... correct?
No, I meant “use”. Otherwise one is vulnerable to a replay attack. The only time this is okay is if one has a hardware-enforced monotonic counter, but that is ~never the case in this context.
but one must never use a nonce that was read from disk or the network
Re -use... correct?
No, I meant “use”. Otherwise one is vulnerable to a replay attack. The only time this is okay is if one has a hardware-enforced monotonic counter, but that is ~never the case in this context.
I think you are talking past each other. You use the stored nonce to decrypt the data that was encrypted using it. What @DemiMarie means is that you should not use stored data to derive another nonce from it. So you should not do something like: counter = read_counter_from_disk(); counter += 1; save_counter_to_disk(counter); encrypt(data, iv=counter)
because it risks nonce re-use (for example after restoring a backup that contains an older stored counter.
So you already verify the hash of the encrypted data before decryption? Then you have a custom AEAD scheme, not just CBC. I read your previous comments as you don't do this and only verify after decryption.
You're confusing the role of metadata and data here. The lion's share of metadata is digest lists. There are too many assumptions being made here by people who have been disinterested in this project until this point.
No, I meant “use”. Otherwise one is vulnerable to a replay attack. The only time this is okay is if one has a hardware-enforced monotonic counter, but that is ~never the case in this context.
No, Wyng is not a network protocol and if you paid attention you'd realize the data chunks are not validated like a network protocol. Its all-or-nothing. There is no "re-play" from error-correction schemes. The digest test already uses a time-invariant function. Anything else an attacker is likely to do is DoS. I'm fine with DoS.
So you should not do something like: counter = read_counter_from_disk(); counter += 1; save_counter_to_disk(counter); encrypt(data, iv=counter) because it risks nonce re-use (for example after restoring a backup that contains an older stored counter.
WHY would I read and (yes) re-use any data like that and apply it in such a fashion???
If I store a counter, it can be in a protected header (metadata) such as archive.ini. And PLEASE don't repeat the error and say I can't trust the header either. It can be signed if that's really necessary, that is the whole point of issue 79. The current usage model assumes that the metadata is protected by the isolated admin environment; transitioning to encryption, that metadata will have to be verified before any data can be processed. And if you assume I'm going to use AES-CBC for metadata signing/verification, then /eyeroll.
I should also point out the GCM problems aren't limited to nonce re-use. When the underlying CTR mode fails, it is (I repeat) catastrophic. Tons of data (or all) data gets exposed. With CBC under the same conditions, only the identical repeat messages tend to be exposed.
Finally, key scheduling is better suited to network streams, but won't be out of the question going forward. I still have to assume that unique IVs will be sufficient, because that's what the API documentation and application guides say so the current tack is a reasonable starting point.
Here's the deal. I do not want this issue flooded with piles of best-practice nostrums from every use case under the sun applied indiscriminately, as is the fashion. Going forward, you can comment if A) you're a cryptographer or B) you demonstrate you've reviewed the Wyng format and present ideas about encryption in "Wyng-ese".
The ideas already in Wyng have to be respected or there will be little point in adding encryption to it.
Please also understand, this is being developed by a single person (me) in my spare time. The encryption feature will be introduced as experimental and probably stay experimental for some time––as happened with deduplication––barring some considerable increase in participation. Other projects have a lot more manpower, and can still get by with delaying (say) correct verification of system updates for over a decade; in non-experimental releases at that.
So, those are the terms and they are terrific. :-)
[...] There are too many assumptions being made here by people who have been disinterested in this project until this point. [...] So, those are the terms and they are terrific. :-)
I did look at the commit mentioning the ticket but given your comment it wasn't clear what is just done this way because it's some very early version of the feature and what is your mid to long term plan.
Anyway: I did comment here since Marek asked me privately if I would have time to take a look. Unfortunately my comments had the opposite of the intended effect and you perceived my comments as some outsider to the project trying to force "best-practice nostrums" on you. Sorry about this, I definitely didn't want to annoy you in the issue tracker of your spare time project. So I will refrain from further comments for now. If you would like to discuss this or related topic in the future feel free to contact me.
DEFAULT_CRYPTO_ALGORITHM = 'aes-256-cbc'
That's literally the only occurrence of this constant in the code, it is not used anywhere :) Currently the encryption uses https://github.com/Tarsnap/scrypt/blob/master/FORMAT - especially because it handles HMAC properly (after encrypting) and has proper KDF too.
@marmarek What do you think about AES-SIV mode?
Good Morning.... The basic encryption implementation has been completed!
Upon new archive creation with arch-init
, encryption is enabled by default. An unencrypted archive may be created with arch-init --encrypt=off
or to encrypt with a specific data cipher arch-init --encrypt=<cipher>
. The unencrypted mode still needs preliminary testing.
The rest is like using Wyng v0.3, although there are additional features slated for v0.4 that will cause further changes in its command syntax and format.
Compatibility: Wyng 0.4 (wip) is being tested on Fedora 32 (Qubes 4.1), Debian 11 and Ubuntu 21.04. Qubes 4.0 does seem like a possibility if A) encryption is not used, or B) suitable encryption library versions are ported to Fedora 25.
The user selects a data cipher (either AEAD or non-AEAD) and Wyng selects a matching AEAD cipher for authenticating metadata. XChaCha20-Poly1305 is now an option for data, and selecting XChaCha20 will now use XChaCha20-Poly1305 as the metadata cipher instead of AES-SIV. (Note that there are >3 sodium/NaCl based libraries now available for Python, and although they seem to vary a lot in quality one of them may be a better option for providing XChaCha20 in the future.)
Metadata and data are encrypted by separate keys derrived with scrypt
from a single passphrase and separate salts. There is no re-keying capability at present, although this could make a nice future addition.
For the IV/nonce , a counter is used with the number-of-messages (or -blocks) safety bound for each cipher mode determining the counter size. For XChaCha20 and XChaCha20-Poly1305, three factors are concatenated together including a 32bit UTC time in seconds and 80 random bits in addition to the counter. This approach was chosen as an efficient way to prevent nonce re-use.
The counters for each key are updated in the remote metadata root archive.ini file as the volume data is being sent, alongside more frequently updated mirror of the counters tracked in the local .salt file. On startup, the two versions of each counter are compared, the larger is taken, and then the counter update step-1 is added as a precaution. The counter is always advanced before incorporating it in a new IV.
The upper bound for each cipher's message counter plus other parameters for the IV:
Cipher | IV/nonce size | Counter Max | Random | Time sec. |
---|---|---|---|---|
XChaCha20 | 192 bits | 2^80-64 | 80 bits | 32 bits |
AES-SIV | 96 bits | 2^48-64 | 48 bits | none |
AES-CBC* | 128 bits | 2^48-64* | 80 bits | none |
If the counter runs out the current key is considered exhausted and no further data will be written. Currently with the XChaCha20 cipher, that allows approximately 2^80 * 64Kbytes of source volume data to be written to the archive before it effectively becomes read-only. There is also a small emergency reserve of 64 (for future) in case more data must be written to make an exhausted archive consistent. Archives that were initialized with a default chunk size >64K can store proportionally more data. Metadata counter is consumed at a bit more than 1/128 the rate of the data counter.
Once the metadata root is verified at startup (currently done via the AEAD cipher), the hashes contained within are used to validate all other metadata + data, always in the context of the latest archive revision. IOW, archive.ini is always updated anytime there is a change in the archive (which is also time stamped) so outdated metadata below the root won't be accepted even when having valid AEAD tags.
Note for AES-CBC: This cipher is currently disabled as the entropy diffusion in the IV appears to be an issue. This could be enabled in the future once this concern is addressed, i.e. by taking the additional step of encrypting each IV before use. The counter limit in the above table refers to the number of AES blocks in this case.
A simple check of cpu flags is made for AES_NI hardware support when an AES cipher is selected, and a check for Cryptodome library version >= 3.9 is made with XChaCha20.
For those who do not want use or rely on AEAD ciphers for authenticating Wyng archives, the metadata root archive.ini can be signed by the user by some other means as a way of authenticating an entire archive. Signing the entire metadata file set is no longer necessary with versions >= 0.4.
Integrate an encryption layer that can also be used to verify metadata and data from the destination archive.
Looking for examples and discussion on applied cryptography techniques from best practices to implementations in various tools including qvm-backup, restic, Time Machine, etc.
Factors
Implementation checklist
Threat model
Wyng's threat model appears to be most similar to an encrypted database: A mass of data that is updated and curated periodically. Attackers gaining access to the entire volume ciphertext, possibly on successive occasions may be assumed.
Security issues
Encryption scheme should be robust and have low interactivity and complexity as well as high isolation potential.
Isolation would be in the form of a Qubes-like environment where the Admin VM (e.g. Domain 0) running the backup process is blocked from direct network access, and encryption/decryption is performed only there. Wyng should be able to encrypt effectively in such an isolated environment.
Compatibility with Admin isolation also extends to how any guest containers/VMs are handled: Encryption and integrity verification cannot rely on the guest environments or their OS templates.
Encryption strategies
LUKS or VeraCrypt on a loop device (which can be isolated) with backing in a remote/shared image file. For example: cryptsetup -> losetup -> sshfs. This solution is readily available but imposes a performance penalty of \~20% on a VM-isolated configuration. It also requires painstaking user setup in a Linux-specific environment; difficult to integrate; poor choice for remote/cloud.Encfs - A FUSE file-encrypting layer may improve performance over a setup based on a loop device. It may also be simpler to setup or even integrate. Advantage: automatic filename (but not size or sequence) obfuscation. Drawback: issue with hardlinks in some encryption modes.CryFS - Another FUSE layer with built-in support for network transports. Complete file metadata obfuscation. Claims superior resistance to attack. Unknowns: Hardlink support, transport isolation potential.Direct crypto library/AES utilization - Uses no external layers, but requires painstaking attention to detail and review by a cryptographer if possible. This option may be a natural choice, given the simplicity of the archive chunk format; any issues around the implementation security should have direct analogues to a wide field of other implementations and their use cases. See initial comments on AES modes.
Some encrypted backup tool that can accept a stream of named chunks with very low interactivity between the front end and back end (e.g. a 'push' model).
(After some deliberation and using Wyng with external encryption layers, this issue will be primarily concerned with an integrated solution similar to item 4.)
Types of data
Wyng keeps volume data and metadata as separate files, and the metadata validates the volume data.
See Issue #79 for specifics on metadata, which is expected to use separate encryption keys.
On commenting...
Following a core tenet of cryptography that the application must be understood thoroughly before making specific decisions, a substantial familiarity with Wyng is required to make sense of this issue (ye have been warned...).
Its suggested that making some incremental backups with Wyng and looking at the metadata under '/var/lib/wyng.backup' is a good starting point. In the source code, the classes under
ArchiveSet()
are instructive in addition tomerge_manifests()
and the places where its used.