Encrypt metadata - Githubissues

ghost commented 7 years ago

Currently all password metadata (username, domain, length, charset) is stored in plaintext.

It has some security implications in case user data got stolen (by malware on local machine or by leak on remote host):

Usernames are used as salt so it'll be known to attacker.
Username + domain pairs in some cases could be used by attacker to hijack account by password recovery options.

Solution : Encrypt all metadata with AES.

It means that you can't access your data without master password but I don't see usecase which it'll be needed. If you forgot password your data won't help you anyway.

palant commented 7 years ago

I received that request before, albeit not via GitHub issues it seems. I rejected it, for reasons of code simplicity and because lack of encryption makes it easier to verify what Easy Passwords stores about you. Also, encrypting metadata is problematic for the sync functionality which is in progress.

I've since reconsidered. Attackers knowing the salt isn't an issue as long as the salt is different for all passwords of the same user (it is because website name is also part of the salt) as well as different users of the same website (it usually is because they have different user names). In order to make rainbow tables useless this is sufficient.

However, listing attack targets in the clear might not be the best idea regardless, simply because some of them could be attacked by other means - such as password recovery. This is particularly problematic with the sync functionality where the data would be stored on a third party service.

The conclusion is that we need to encrypt the data with an AES key derived from the master password. Additional encryption for legacy passwords isn't necessary. The good news is: this will make master password validation more robust. If decrypting the data succeeded and resulted in a valid JSON structure then the master password must be correct. The chances of accepting a wrong master password will be way lower than with the current approach. Also, if backups are encrypted then importing them will only succeed if the same master password is used - so we will no longer produce a bogus state by importing a backup with the wrong master password.

For performance reasons, we cannot store the data as a single encrypted blob. However, we are already breaking the data into records right now, we can encrypt each record separately. We would only need to avoid leaking information via record names. This can be done by having an encrypted "index" record which would map site names to random record names.

Now the tough part is having sync (#24) work regardless. We need either a storage format where a merge will work without decrypting the data, or we will need to perform syncs when the user enters their password and accept the resulting delays in the workflow.

palant commented 7 years ago

Another idea of how the data structure might work: generate predictable record identifiers via SHA512-HMAC, with the salt being derived from the master password. The potential attacker doesn't know the master password, so they don't know the salt being used either. Making any conclusions from the record identifier to the actual site won't be possible directly then. It could look like this (with the values actually being encrypted):

"site:abcd1234": {"name": "github.com"},
"password:abcd1234:fedc9876": {"name": "palant", "type": "generated": "length": 8},

So in order to get the list of all sites we would retrieve all records with the site: prefix, decrypt them and read their name properties. And assuming that abcd1234 is the hash for github.com retrieving the list of all GitHub passwords GitHub could be done by retrieving all records with the password:abcd1234: prefix and decrypting these.

With the record identifiers being fixed for the same master password we can merge the records without knowing their contents. If the local and remote records with the same identifier differ (in their encrypted form) we check whether the record changed locally since the last sync. If it did then the local version wins, otherwise we take the remote version.

Structure of the storage is still visible this way, e.g. how many websites and how many passwords for each websites. So an attacker knowing that I have lots of GitHub accounts might still be able to guess that "abcd1234" is GitHub simply by the number of passwords stored for this website. But IMHO they aren't learning anything they didn't already know this way.

ghost commented 7 years ago

Thank you for the answer. Those are great insights! I'll keep my fingers crossed for this!

palant commented 7 years ago

In my comments above I underestimated one important issue: the encryption key. The encryption key cannot be derived directly from the master password, without a salt that would be a weakness. However, independent clients cannot come up with a shared salt that is independent from the master password. This means that their encryption keys necessarily have to be different, at least until the first sync. So we won't get around managing keys.

It seems that each encrypted record should contain the salt used to derive the key in addition to IV and the actual encrypted text. There should also be a special key cache record containing known keys, so that these don't have to be derived every time:

"keys": {
  "12345678": "abcddcba",
  "87654321": "dcbaabcd"
}

Here 12345678 is the salt and abcddcba the encryption key. For a non-syncing client this record will contain exactly one key - the information will actually be redundant because both the salt and the key need to be known already in order to decrypt the record. However, it can still be used to verify consistency (particularly, whether the correct master password has been entered). The salt used to encrypt this record is the one to be used when encrypting new records.

When the client syncs for the first time, the sync logic should make sure that the local keys record is always overwritten by the remote one (not marking it as locally modified should be sufficient for that). This will make sure that after syncing all clients will arrive on a common encryption key (the one used by the first client connected to the account). Also, during a sync (if master password is known) or after entering the master password, Easy Passwords should go through the records and verify that all keys these refer to are known - otherwise it should derive the keys and add them to the corresponding record.

This approach seems to be a reasonably robust and shouldn't degrade performance. With the first sync typically happening while the master password is known, it should also make sure that the keys cache is merged immediately and no unnecessary deriving of keys has to happen later.

ghost commented 7 years ago

I wonder if you can use blake2 hash instead of hmac-sha512 as it should have much better performance.

Here's some info: https://blake2.net/ https://tools.ietf.org/html/rfc7693

Implementations in javascript: https://github.com/dcposch/blakejs https://github.com/emilbayes/blake2b https://github.com/dchest/blake2s-js

Status of firefox implementation: https://bugzilla.mozilla.org/show_bug.cgi?id=1310763

Also instead of AES encryption you can consider using ChaCha20 which has better performance on devices without AES-NI or equivalent cpu instructions.

https://tools.ietf.org/html/rfc7539

palant commented 7 years ago

I wonder if you can use blake2 hash instead of hmac-sha512 as it should have much better performance.

Performance shouldn't matter in this scenario, normally only one hash is being calculated at a time. For example, when a new password needs to be saved, we would need to hash the username to generate the record identifier for this password. In fact, calculating a single hash is cheap enough that we might hash the site name again because it's easier than remembering previous results. Similarly, we will only need to retrieve at most one record by the record identifier. Even if we need all passwords for a site, we hash the site name and decrypt all entries with the given prefix.

And while blake2 might be more secure than SHA2, it shouldn't matter either because the hash calculation contains a salt that is unknown to the attacker. On the other hand, blake2 is a relatively new algorithm and hasn't seen as much attention as SHA2. Not to be underestimated: for HMAC-SHA512 there is a known-good implementation, and I know which fall traps to avoid.

Also instead of AES encryption you can consider using ChaCha20 which has better performance on devices without AES-NI or equivalent cpu instructions.

Again, we aren't talking about amounts of data that would justify choosing one algorithm over another. Normally, Easy Passwords won't need to decrypt more than a few records at a time. The only exception is displaying the "All Passwords" page when all records have to be decrypted at one. I'll still have to see how that performs but a noticeable delay would be very unexpected. I have been using the extension quite heavily and I still have less than 15 kB of data.

Here as well, we have a known good implementation of AES but no canonical ChaCha20 implementation in the browsers yet. In general, this Stack Exchange answer does quite nice comparing the advantages of the two algorithms.

ghost commented 7 years ago

What about switching to AES-GCM from currently used AES-CBC? As being an authenticated encryption it's superior choice. Also it's implemented in same way as AES-CBC so migration would be easy.

https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/encrypt https://www.w3.org/TR/WebCryptoAPI/#dfn-AesGcmParams

palant commented 7 years ago

Yes, we should switch to AES-GCM I guess. Main issue is again me not being terribly familiar with the topic, which is way more complex than you are implying. There is this paper for example which is hopefully no longer a concern but should serve as a warning.

Looking at this discussion and this paper (page 125), the main concern appears to be IV reuse. Using the recommended IV size (96 bits) and a proper source of randomness seems to be sufficient to address this. Larger initialization vectors are not beneficial and discouraged. It's also essential to leave tag length at the default of 128 bits.

There is a good explanation of the additionalData parameter here. Given our use case, it doesn't look like we can leverage this additional security mechanism: we only have a single secret (the master password) and this one already went into the generation of the encryption key. So this parameter should be left empty in our case.

ghost commented 7 years ago

I was aware about nonce reuse issue but as you pointed in previous discussion there is very little actual data to encrypt so it would never happen.

palant commented 7 years ago

I was aware about nonce reuse issue but as you pointed in previous discussion there is very little actual data to encrypt so it would never happen.

Oh, it's actually very easy to fall into this trap. All you need to do is derive the nonce from the master password, so that the nonce would be deterministic for each key. Seems pretty smart at first (keys are unique, so this is an easy way to ensure nonce uniqueness), not so much if you look closely (if you update the data for a key, you end up encrypting different data with the same key and nonce). Generating nonces randomly should be completely unproblematic on the other hand.

palant commented 7 years ago

Another idea of how the data structure might work: generate predictable record identifiers via SHA512-HMAC, with the salt being derived from the master password.

While working on the implementation, I realized that this is actually a weakness. It is necessary that this HMAC secret is identical for all instances, meaning that we cannot use a random salt when deriving it from the master password. This in turn allows creation of rainbow tables - while the attacker shouldn't see the HMAC secret itself, they can guess that most users will have "google.com" in their sites list and look for the corresponding SHA512-HMAC-generated record identifiers.

Back to the drawing board...

palant commented 7 years ago

Ok, the conclusion is that both the salt used to derive the encryption key and the HMAC secret have to be random. This means that different clients will have different HMAC secrets and consequently different record identifiers. We can solve this by changing the HMAC secret to the one used remotely when syncing for the first time, at this point we always know the master password and can update the data. And since we are doing that anyway, we can also change the salt used to derive the encryption key and re-encrypt the data. As a result, all synced clients will use the same encryption key and the complicated key management I came up with above won't be necessary any more.

Here is what the data structure should look like in the end:

"salt": <random salt used to derive the encryption key>,
"hmac-secret": encrypted(<random HMAC secret>),
"site:hmac-sha(example.com)": encrypted({"name": "example.com"}),
"password:hmac-sha(example.com):hmac-sha(username)": encrypted({"type": "generated2", "name": "username", ...}),

The salt record is special, it's the only one being stored without encryption. The hmac-secret record needs to be decrypted every time when the user enters their master password, this is necessary to access further data and can also be used to verify the master password.

ghost commented 7 years ago

Hm, I'm thinking about it.

Random salt means when we lost it or damage it, all metadata will be inaccessible even with correct masterpassword. (I know - backups)

Also metadata validation means we setup perfect oracle for an attacker to test against. It's still safe with password strong enough but it's better when potential attacker isn't sure what she's looking for.

Somewhat offtopic:

I saw that many encryption frameworks (gocryptfs, luks, fscrypt) do password management using wrapped key: password -> unwrap key -> decrypt/hash data with key

This solves password change problem which can be quite big issue for tools like easypasswords as actual data will be encrypted/hashed using static key which is never changed and user can change his actual password used for wrap/unwrap key wherever she wants.

I'm not sure if this is relevant for this project so I'm not opening new issue for now. It has same potential disadvantages as I stated above.

Also going this way would mean we recreate static password manager with actual passwords created on demand instead of being near stateless.

palant commented 7 years ago

Also metadata validation means we setup perfect oracle for an attacker to test against.

That's generally the issue with AES-GCM :wink:. But an attacker will always be able to tell correctly decrypted data and incorrectly decrypted apart. That's why we use scrypt to make sure that they won't be able to decrypt it by guessing the master password.

I already thought about storing an encrypted key, so that the master password can be changed without changing the key. It's not really worth doing in our case because we'll need a way to change the key for sync anyway. Besides, changing the master password invalidates all generated passwords which presents a much bigger challenge.

ghost commented 7 years ago

Besides, changing the master password invalidates all generated passwords which presents a much bigger challenge.

What I meant is that generated passwords will be created from this static key, not masterpassword.

That's generally the issue with AES-GCM :wink:. But an attacker will always be able to tell correctly decrypted data and incorrectly decrypted apart.

I started to afraid that we are on thin ice here. The difference between storing encrypted metadata and storing encrypted passwords is getting smaller.

Maybe it's better to go for https://github.com/palant/easypasswords/issues/58 and leave this for now.

palant commented 7 years ago

What I meant is that generated passwords will be created from this static key, not masterpassword.

Then it would no longer be possible to recreate the passwords by entering the same metadata.

ghost commented 7 years ago

Yeah, but it's similar problem to recreating metadata without salt. (backups, backups, backups) :smile:

palant commented 7 years ago

Yeah, but it's similar problem to recreating metadata without salt.

No, the salt is only relevant for encrypting/decrypting the metadata. Password generation doesn't depend on it, it depends on the decrypted metadata. If you still remember your user name for a website or you printed the list of your passwords, then you can always generate the same password.

ghost commented 7 years ago

If you still remember your user name for a website or you printed the list of your passwords, then you can always generate the same password.

If you don't remember username you can't recreate password If you don't remember password options you can't recreate password If you lost salt you can't decrypt metadata so you can't recreate password Printing passwords = backup

Losing metadata is not as bad as losing master password (or key) but still can make your passwords unrecoverable. That make it only slightly better (depends on how many passwords you have). that was my point.

palant commented 7 years ago

I just pushed the biggest chunk to the crypto-update branch. At this point, everything seems to be working and both metadata and backups are encrypted. What's still missing:

Importing backups will currently only succeed if salt and HMAC secret are identical - meaning a backup created with the same instance. If the parameters are different, the backup needs to be decrypted first. This mode still needs to be implemented.
~~Importing legacy (unencrypted) backups is currently unimplemented.~~
~~Data from old extension versions isn't migrated automatically, resetting master password is the only option right now.~~
~~"All passwords" page needs a master password prompt now, with way more operations depending on the master password.~~
~~Notes can be handled better in the UI now that these aren't encrypted separately and can be accessed directly.~~
~~Stressing that stored passwords are encrypted is pointless now that all data is encrypted.~~

ghost commented 6 years ago

Mozilla is building new password manager and uses AES-GCM to encrypt data and HMAC-SHA-256 to generating search hashes for user data.[1]. I don't know if there is anything you can borrow from them but it's probably worth looking.

[1] https://github.com/mozilla-lockbox/mozilla-lockbox.github.io/blob/master/architecture/data-storage.md

palant commented 6 years ago

Interesting, they've faced the same challenges synchronizing an encrypted data storage across multiple devices but solved them differently. I'll go through that thoroughly later and decide whether it's an improvement over my approach - it's not too late to change things yet.

palant commented 6 years ago

I think I got all the relevant differences now:

Per-item keys

Mozilla chose to use per-item encryption keys merely to be future-proof apparently (e.g. for password sharing). As it is, this approach merely introduces complexity because both items themselves and the corresponding keys have to be synced (the latter are also encrypted but with all with the same master encryption key). They also say that randomly generated item keys automatically avoid potential nonce collisions, randomly generated nonces should be sufficient for that however. Also, they don't really avoid that issue - there might still be nonce collisions when encrypting per-item keys.

Avoiding re-encrypting data on sync

The document doesn't make it clear but it seems that the master encryption key is supposed to be identical on all devices. The document leaves a critical point undecided here, namely how to choose a salt to derive the master encryption key from FxA application prekey. A fixed salt hardcoded into the application would weaken the scheme IMHO.

My plan is to replace the salt on the device on first sync, which effectively means syncing the master encryption key between all devices. This requires re-encrypting all data on first sync, but there shouldn't normally be a lot of it. Subsequent syncs don't require any re-encrypting any more.

Item identifiers

Mozilla uses randomly generated UUIDs as item identifiers. This means that two items created on different devices are always considered unrelated (their UUIDs are different) and not merged. This simplistic approach doesn't work for Easy Passwords, we need to be able to merge passwords if site and password name are identical, even if these passwords were created on different devices in parallel. Consequently, a more complicated approach to generating item identifiers was necessary.

JSON Web Encryption (JWE)

Mozilla utilizes JWE spec to wrap encrypted items, which is probably a good backwards compatibility solution for them. Using a standardized record format might be worth doing for Easy Passwords as well. On the other hand, we won't ever implement anything close to the full standard - it will always be one encryption algorithm for everything. So we could just as well skip the overhead of describing it, I'm still undecided on this.

palant commented 6 years ago

The branch has been merged. There are a few issues to be addressed before release still but we should have a release soon.

palant / pfp

Encrypt metadata #47

Per-item keys

Avoiding re-encrypting data on sync

Item identifiers

JSON Web Encryption (JWE)