Optimizing/Aligning the IND-CCA2 construction

fluppe2 commented 1 year ago

It seems that the crypto-refresh merged the following:

https://gitlab.com/openpgp-wg/rfc4880bis/-/merge_requests/278

That is X25519 and X448 now mix the public key and ephemeral share (from encrypting party) into the KDF.

Should we align or still try to find an argument that the hashing step in X15519-/X448-/ECDH-KEM is covered by the KEM combiner?

ahuelsing commented 1 year ago

I think there are two aspects to consider here.

a) Do we want to optimize away the hash, and b) if we do not do so, do we want to follow the crypto-refresh.

I think b) is clearly a yes as it will ease the transition, allowing code-reuse and does not initiate new discussions.

Regarding a) the situation is as follows. From a provable security point of view we could omit the hash in hashed-DH as the output anyway gets fed to a KDF in the key combiner. What we would have to show is that the resulting KEM is still secure whenever just one of the input KEMs is still secure. Let's consider the case where ECDH is still secure. In that case, the key-share and ciphertext (as well as potentially the receiver key-share a.k.a pk in the KEM) get fed to a KDF with some additional, public, but potentially adversary controlled inputs. I.e., the hash is happening and in the ROM it is straight-forward to argue that the additional input does not hurt security. A similar, but simpler argument works in the Kyber case.

However, from a computational point of view, we are not gaining much. OpenPGP is not deployed in settings where a single hash will make a massive difference in terms of cost or energy (in contrast to Kyber in general -- see the pqc-forum discussion about the PRNG-hash in Kyber). At the same time, I think there is a second dimension to be considered. In optimizing away the hash, we build something that breaks down when people reuse parts of it. E.g., if people decide to implement "the DH-KEM as in openPGP" without reading the security considerations in detail, option a) will leave them with an only passively secure (== insecure in general applications) KEM. Similarly, if at a later point someone decides to change the KEM combiner and the new one does not include a ciphertext hash as it works somehow differently... the result is insecure.

Given that we describe the setup in a somewhat modular way, I consider it dangerous to base security on the monolithic construct. Hence, I suggest to not do a) but do b).

falko-strenzke commented 1 year ago

SK: our X25519 / X448 construction is a little bit different from how the C-R is doing it but same security
We could add a reference to the X... drafts saying that we also fulfill their requirements
AH: adding public key only for multi user security
AW: code reuse is not primary concern:
AH: might have to include the public key into every hash function call for multi user security.
Will have to look into it and determine which attack has to countered. Kyber hashes the public key internally.
AW: remove public key from fixed info in key combiner and include in ECC algorithm
AW: current MR https://gitlab.com/openpgp-wg/rfc4880bis/-/merge_requests/278/diffs with secret as last part in KDF.
FSE: secret should come first due to preventing collision vulnerability
AW: since there is no code reuse we can use that order
AH: could mix in own long term public key
AW: but sender might not have a public key
AW: use public key in both schemes and then also in the KDF?
AH: decryption operation might not be the only point of attack
FSE: mixing in once should be sufficient?
AH: depends on the attack
AW hold back PR until clarified by AH: https://github.com/openpgp-pqc/draft-openpgp-pqc/pull/50

ahuelsing commented 1 year ago

Here once more my conclusion regarding multi-user security:

Setup: We have two KEMs which we assume to be multi-user secure. We combine their outputs using a KEM-combiner k = H(k1, c1, k2, c2), c = (c1, c2).

Question 1: Do we have to add pk1 and pk2 to H to preserve multi-user security?

Answer: No. To break security the adversary has to distinguish a set of challenge keys k_u from random (one per user / pk). For each of these k it holds that if the adversary never makes a query H(k1_u, c1_u, k2_u, c2_u) they have a zero advantage over guessing [the result of the query is k_u if it was not random and something different from k_u if it was random -- without querying a random oracle H at a position x an adversary cannot make any statement about H(x)].

The only multi-user advantage that the adversary could gain therefore consists of queries to H that are meaningful for two different users u1 != u2 and their associated public keys. This is only the case if (c1_u1, c2_u1) = (c1_u2, c2_u2) as the ciphertext values decide for which challenge the query is meaningful. I.e., we need a ciphertext collision between challenges. We will argue that this is sufficiently unlikely.

Note, that up to decryption errors, encapsulation is an injective function (otherwise you could not decapsulate). That means that the actual image of the (partial) encapsulation function in the ciphertext space C is at least as large as the space of shared secrets K which in turn has to have sufficient size such that the min-entropy of the challenge k is linear in the security parameter (in practice it will be 256bit and therefore |K| >= 2^256). Now, even if (worst case ever) two users have the same public key, the probability that the challenge ciphertexts are identical is at most 2^-min_entropy (this makes the assumption that the randomness used in the generation of the two challenges is uncorrelated; if it is not independent this is not guaranteed). Hence, the probability that any two users have the same challenge ciphertext is |U^2| 2^-min_entropy.

In consequence, the ciphertexts already work sufficiently well as domain-separator.

Question 2: Can adding pk1 and pk2 to H add multi-user security if the KEMs themselve are not multi-user secure (beyond the trivial non-tight bound)?

Answer: Unlikely, we were unable to construct a counter-example in a short time period, but previous works were only able to show that using a key-ID to domain-separate all hashes preserves multi-user security (see, e.g. https://eprint.iacr.org/2021/1351.pdf).

In consequence, it seems as if it does not make sense to add the pk.

openpgp-pqc / draft-openpgp-pqc

Optimizing/Aligning the IND-CCA2 construction #43