narfbg / SimpleEncryption

Simple Encryption for PHP
22 stars 6 forks source link

Consider Driving New Cipherkey for Each Encryption #4

Open ircmaxell opened 10 years ago

ircmaxell commented 10 years ago

Consider generating a random salt every time you encrypt data.

That way, you can use HKDF to derive a new cipherkey, mackey and iv from the master key. Then, just pass the salt with the ciphertext instead of the iv (since you can regenerate the IV).

For example:

$salt = self::getRandomBytes(16, true);
$cipherKey = self::hkdf($this->masterKey, 'sha512', 32, 'cipherkey', $salt);
$macKey = self::hkdf($this->masterKey, 'sha512', 32, 'mackey', $salt);
$iv = self::hkdf($this->masterKey, 'sha512', 16, 'iv', $salt);

That way, you don't run into CTR key rotation issues with generation of 2^64 plain texts with the same cipher key (since every encryption here uses a unique cipherkey).

narfbg commented 10 years ago

Hmm, I hadn't thought of that and I wanted to use a salt. Thanks!

narfbg commented 10 years ago

I was just implementing this and stumbled upon the same issue that I had earlier (wanted to use the IV as salt) ... Because authentication is applied over the cipherText and IV/salt, I wouldn't be able to derive the hmacKey during decryption.

The same issue prevents me from using PBKDF2 to allow users to just use a password instead of a key.

ircmaxell commented 10 years ago

Well, as said in #5, that shouldn't be done anyway.

And you don't need to MAC the IV/salt with the ciphertext anyway.

If you derive the keys, using the code I have above, then validating a MAC on the ciphertext (decoded) without the IV/salt would also validate the IV/salt since the key is derived from it.

Example:

$raw = base64_decode($data);
$salt = substr($raw, 0, 32);
$mac = substr($raw, 32, 32);
$ciphertext = substr($raw, 64);
$macKey = hkdf($masterKey, 'sha512', 32, 'mackey', $salt);
if (hmac(hmac($ciphertext, $macKey), $macKey) !== hmac($mac, $macKey)) {
     // Authentication failed
}
$cipherKey = hkdf($masterKey, 'sha512', 32, 'cipherkey', $salt);
$iv = hkdf($masterKey, 'sha512', 16, 'iv', $salt);
// decrypt

There really isn't any reason to MAC the encoded ciphertext as opposed to the raw ciphertext (and doing so may open up additional vulnerabilities)...

defuse commented 10 years ago

Does anything else use this salt derivation method? It seems like it's relying on unproven properties of HKDF and HMAC.

ircmaxell commented 10 years ago

@defuse http://crypto.stackexchange.com/questions/5630/deriving-keys-for-symmetric-encryption-and-authentication (I asked the question, but check out the answers)...

defuse commented 10 years ago

@ircmaxell I feel a little better about it now, but would still like to see a security proof.

If used, the salt should be 256 bits, not 128. If it's 128, then a salt will repeat after 2^64 encryptions.

defuse commented 10 years ago

I'll think about this more later. I'm probably being overly-cautious.

narfbg commented 10 years ago

OK, moving the discussion from #5 here, since I think it's more relevant.

Quoting:

So, I'm more concerned about the possible drawbacks of not covering the Base64 string

Which are?

What you want to be doing is EtM (Encrypt Then MAC). This is specified in ISO/IEC 19772:2009 as using a MAC directly on the ciphertext output from the encryption function. Not the ciphertext with IV, not the ciphertext encoded, the raw ciphertext.

Now, I don't know about ISO/IEC 19772:2009 (paywalled, not cool), but a few searches led me to these:

And a few StackExchange answers that also say we should (H)MAC basically everything:

And the cryptographic doom principle as a bonus, although I'm not sure if key/IV derivation counts as a cryptographic operation in that context.

True, all of these talk about CBC and I don't know if the same attacks can be successful against CTR, but still - it makes it sound scary to HMAC the raw cipherText, not including the IV.

That's not the real issue though, I could HMAC(IV || cipherText) and still use the IV as salt (or derive an IV using that salt). The question is, is it possible that an attacker could exploit that to forge a valid HMAC + cipherText with their own salt? Or does using that salt for derivation really verifies its authenticity?

This answer to my question says it's "a definite no". (In relation to your exchange though, the second answer reassures us that using HKDF with salt is encouraged)

So we have a lot of contradictions here ... I don't like trusting the internet in general. I'd like to trust you two guys here, since I've learned a lot of stuff from you, one way or another. But I'd also like us to be completely sure about it.

I'm starting to think of another approach: keep the salt secret, appended to the key. It would double the required storage, but at least that isn't a security risk.

defuse commented 10 years ago

You absolutely must HMAC the IV in the normal case. The only possible exception is with the weird salting thing because changing the salt would change the HMAC key, but I'm still not sure that it's secure.

defuse commented 10 years ago

Salt shouldn't be kept secret. If a salt is secret, it's not a salt, it's another key.

narfbg commented 10 years ago

Sure, I get that ... point being, it wouldn't be used as a key itself.

defuse commented 10 years ago

Yeah, what I mean is if the security depends on it being secret, then it is a "key", regardless of whether you pass it as the "key" parameter of a cipher or MAC.

defuse commented 10 years ago

Breaking out the whiteboard. :smiley:

defuse commented 10 years ago

I think (80% sure) the following is safe:

Salt = Random(256 bits)
IV = HKDF(K, Salt, 'IV', 128 bits)
Km = HKDF(K, Salt, 'MAC KEY', 256 bits)
Kc = HKDF(K, Salt, 'CIPHER KEY', 256 bits)
Ciphertext = AES-256-CTR(Kc, IV, Message)
MAC = HMAC(Km, IV || Ciphertext)
Output: Salt || MAC || Ciphertext

It's important to include the IV in the MAC in the second last step. Now I have a (possibly rhetorical) question: What are all the benefits of this added complexity?

Edit: I corrected IV = HKDF(K, Salt, 'IV', 256 bits) to IV = HKDF(K, Salt, 'IV', 128 bits) as pointed out by @Sc00bz below.

defuse commented 10 years ago

More to the point, why isn't all encryption done this way, if it's strictly better?

Sc00bz commented 10 years ago

IV = HKDF(K, Salt, 'IV', 256 bits)

You mean: IV = HKDF(K, Salt, 'IV', 128 bits)

Now I have a (possibly rhetorical) question: What are all the benefits of this added complexity?

I heard somewhere for AES counter mode (basically anything with a block size less than like 256 bit) that you should change both the IV and the key for each message to prevent IV reuse (with the same key). This is one way to accomplish this.

More to the point, why isn't all encryption done this way, if it's strictly better?

Doing "IV || MAC || CT" is easier and the encryption key is more than likely changed by other means (i.e. Axolotl Ratchet). If they are doing AES counter mode and not changing the key each message they are probably doing something wrong.

defuse commented 10 years ago

@Sc00bz Thanks!

ircmaxell commented 10 years ago

I never claimed it was "strictly better".

I do know that it is referenced in Cryptography For Developers, though using PKCS5 derivation (PBKDF2) instead:

key_material = PBKDF2(master_key, salt, 16, key_len + auth_len)

Additionally, TLS1.2 does this for deriving keys from the master:

      key_block = PRF(SecurityParameters.master_secret,
                  "key expansion",
                  SecurityParameters.server_random +
                  SecurityParameters.client_random);

Partitioned:

  client_write_MAC_key[SecurityParameters.mac_key_length]
  server_write_MAC_key[SecurityParameters.mac_key_length]
  client_write_key[SecurityParameters.enc_key_length]
  server_write_key[SecurityParameters.enc_key_length]
  client_write_IV[SecurityParameters.fixed_iv_length]
  server_write_IV[SecurityParameters.fixed_iv_length]

The general point being that you only need to manage one master key, and each encryption uses a separate MAC key, Cipher key, and IV which are derived from the master.

Yet another example: http://crypto.stackexchange.com/questions/17830/deriving-2-keys-using-hkdf (I really hate citing SO, but their SEO is good, and wading through papers is difficult after a day at work)...

defuse commented 10 years ago

Sweet, I learned something new today. Someone has probably proven the IV-from-salt method secure, or if not there's an easy paper. ;)

narfbg commented 10 years ago

Citing my question was a bit awkward. :D

But anyway, we've still got the same problem - having to HMAC data that influences the HMAC itself. Whether that is salt, the IV used as salt, and/or a versioning tag (#7), it must be authenticated.

One could argue that this is simply wrong, that the encryption and MAC keys should both be stored separately instead of derived depending on such properties. I don't however see how that would be possible specifically with the versioning tag and Anthony here isn't the only person to ever suggest it, so somebody must've proven that if byte X of the cipherText output decides exactly what type of a MAC to use and then the produced MAC authenticates the said byte X, it is indeed secure and not just basic logic that isn't entirely true in the crypto world.

I haven't ever heard of a requirement to always use a different encryption key for CTR though. The IV - sure, but not the key itself. It deffinately adds strength, which is of course what we want, but I don't believe not doing it is a risk.

defuse commented 10 years ago

There's an easy solution: Use CBC mode with a random IV, and HMAC the IV. We know that works. :)

defuse commented 10 years ago

Errr, I guess CBC mode has a 2^64 encryption limit because of IV collisions. Wouldn't it be nice if block ciphers all had 256-bit blocks?

narfbg commented 10 years ago

Well, I/we went for this relatively complex structure before @Sc00bz made that suggestion, because it just adds more strength, regardless of the mode - I'd want to do that anyway. Using CBC wouldn't magically solve the puzzle.

Sc00bz commented 10 years ago

I wasn't suggesting switching to a large block cipher. I was merely saying it isn't necessary to rekey so often when the block size is large. If I was I'd say "you should use Threefish".

Also there are people on both sides of the fence with rekeying after each message. I think rekeying after each message should be done if the block size is less than 256 bits. Same with CTR vs CBC. I like CBC because it fails gracefully.


I had the same problem you did with using unauthenticated data (the salt), but I couldn't find anything really wrong with it. Just that cryptographers might not like it. Anyways you could do this:

Salt = Random(256 bits)
IV = HKDF(K, Salt, 'IV', 128 bits)
Km = HKDF(K, NULL, 'MAC KEY', 256 bits) ****
Kc = HKDF(K, Salt, 'CIPHER KEY', 256 bits)
Ciphertext = AES-256-CTR(Kc, IV, Message)
MAC = HMAC(Km, Version || Salt || Ciphertext) ****
Output: Version || Salt || Ciphertext || MAC ****

I marked the lines that are different with ****. I don't know of any problems with using the same MAC key for all messages. Also I think having the MAC at the front or the end is better because being in the middle is just weird. I know I said "IV || MAC || CT" but I just was copying what was done before. Also note that in a later version you can add another MAC but you're stuck with the original MAC. Unless you are going to look at the unauthenticated data (the version) before checking the MAC.

narfbg commented 10 years ago

I wasn't suggesting switching to a large block cipher. I was merely saying it isn't necessary to rekey so often when the block size is large. If I was I'd say "you should use Threefish".

I didn't say you did, I'm just surprised that the same key shouldn't be used for more than one message.

Also I think having the MAC at the front or the end is better because being in the middle is just weird.

This is the current behavior and I don't intend to change it. :)

Also note that in a later version you can add another MAC but you're stuck with the original MAC. Unless you are going to look at the unauthenticated data (the version) before checking the MAC.

That's at the core of what we're trying to solve here really. Having such data (whether it's the salt or version) authenticated, but at the same time basing authentication on it. Basically the same example that you gave, but with Version deciding the underlying HMAC hash function.

Talking about the salt in specific though, I'm now thinking of another approach - why not just derive it from the masterKey (obviously with no salt on it's own):

Salt = HKDF(K, NULL, 'Salt', 256 bits)
IV = HKDF(K, Salt, 'IV', 128 bits)
Km = HKDF(K, Salt, 'MAC KEY', 256 bits)
...

Does that make sense?

ircmaxell commented 10 years ago

Talking about the salt in specific though, I'm now thinking of another approach - why not just derive it from the masterKey (obviously with no salt on it's own):

Salts should be random. It provides additional entropy and significant additional security when reusing the master key in different contexts. Give section 3.1 of RFC5869 a read...

As far as what needs to be authenticated, since IV is derived from the salt (along with the keys), then theoretically at least (I don't have a proof), you should only need to authenticate version || salt || ciphertext.

Now yes, the authentication key is dependent upon the contents being authenticated. But if HKDF is a strong PRF, then it should hold that there should be no way to forge a salt without knowing the master_key in such a way as to generate an authentication key which verifies the forged salt. Not at least faster than pure online brute-force.

In fact, TLS 1.1+ does a very similar thing:

MAC(MAC_write_key, seq_num +
   TLSCipherText.type +
   TLSCipherText.version +
   TLSCipherText.length +
   IV +
   ENC(content + padding + padding_length));

Note that the salt is already communicated via the asymmetric handsake (in 2 halves, the server_random and client_random). And the IV is a random value instead of being derived. But the cipher version and type information is MACed along side the content and IV.

It should be safe to start the process on unauthenticated data (namely salt and version) prior to authenticating, because those processes leak nothing about the plain text. The problem with decrypting unauthenticated data (MAC-Then-Encrypt) is that it opens padding oracle attacks and chosen plaintext attacks (like BREACH). You want to avoid decrypting unauthenticated data, as that can open attack vectors to the plaintext. But simply chosing a protocol version and salt to derive keys with shouldn't open either of those attacks.

The only possible attacks that deriving keys from an unauthenticated salt would require HKDF to be broken (as if it's a secure PRF, then there should be no way short of brute-force to produce a specific key from a specific salt, with or without knowing the master key).

An unauthenticated version poses more of a problem, since there could be a massive implementation flaw in an earlier version which the attacker can then exploit. But even supporting that version anymore would open the door to those attacks anyway. So if that significant of a hole was found, then it would be better to remove support for that version all together.

Perhaps it would be worth while adding the version to the HKDF derivation (as part of the "info block") to tie the derived keys to the version of the protocol. That way even if a massive hole was found in an older version, newer ciphertexts would be invalid on the old version due to the key mismatch. So something like:

key_material = HKDF(master_key, Salt, 'key_material:' || version, iv_size+key_size+mac_key_size)

That way, even if an implementation detail made it so that authentication always succeeded (due to a bug), then a future version which fixed it (but used the same underlying cipher) couldn't be attacked by changing the version bit back (which would work since auth always succeeds)...

narfbg commented 10 years ago

Well it sounds good to me. It'd be either that or screw it all and just follow http://tools.ietf.org/html/draft-mcgrew-aead-aes-cbc-hmac-sha2-04, even if that's still not finalized and I like CTR better.

ircmaxell commented 10 years ago

Well, that just talks about the encryption step, where as we're talking about the key derivation step which happens before you get to that step.

You still need to supply the 2 keys and IV. Also note that the IV section of the appendix specifically allows for or a pseudorandom process with a cryptographic strength equivalent to that of the underlying block cipher, which is what we're doing with HMAC above...

In fact, if you look at it, it supports what we're talking about here via the Additional Data parameter to the MAC (which could be our salt).

narfbg commented 10 years ago

At this point we're talking about pretty much the whole process. :)

In fact, if you look at it, it supports what we're talking about here via the Additional Data parameter to the MAC (which could be our salt).

Exactly, it provides a solid base as a (potential) standard, while at the same allowing some flexibility. AD could also be salt + version tag.