Use of `EVP_PKEY_decrypt()` should be more secure

DDvO commented 2 years ago

This came up in https://github.com/openssl/openssl/pull/15283#discussion_r772433609.

Looks like ERR_clear_error() is not really constant-time due to static ossl_inline void err_clear(...) in crypto/err/err_local.h calling OPENSSL_free(es->err_file[i]) and OPENSSL_free(es->err_func[i]), where es->err_file[i] and es->err_func[i] should be NULL if and only if no error queue entry is present at index i.

davidben commented 2 years ago

For timing-sensitive code, one shouldn't be pushing sensitive information to the error queue in the first place. Trying to make the error queue itself constant-time is a losing proposition.

I'm not familiar enough with CRMF to, at a glance, point out what that code should be doing instead, but if you're relying on a constant-time error queue, that means some layer somewhere isn't calling or exposing the right API. Assuming this is some attempt to work around flaws in RSAES-PKCS1-v1_5 (if it's OAEP, this shouldn't be necessary), merely merging failure and wrong size conditions together doesn't seem sufficient. TLS needs to do a constant-time swap to a random secret.

DDvO commented 2 years ago

Thanks @davidben for your swift reaction. While its does not answer the question I posted here, it addresses the motivation for the question.

For timing-sensitive code, one shouldn't be pushing sensitive information to the error queue in the first place. Trying to make the error queue itself constant-time is a losing proposition.

Good point. Even the fact that an error entry is pushed is a problem and not even close to constant-time (in comparison to not pushing an error). ERR_raise() usually even allocates file and function name strings on the heap, calling OPENSSL_strdup().

BTW, I wonder why the function static ossl_inline void err_set_debug(...) defined in err_local.h that is invoked indirectly from the macro ERR_raise() and has a body of 11 lines is defined as inline, and likewise various other functions defined there.

I'm not familiar enough with CRMF to, at a glance, point out what that code should be doing instead, but if you're relying on a constant-time error queue, that means some layer somewhere isn't calling or exposing the right API. Assuming this is some attempt to work around flaws in RSAES-PKCS1-v1_5 (if it's OAEP, this shouldn't be necessary), merely merging failure and wrong size conditions together doesn't seem sufficient. TLS needs to do a constant-time swap to a random secret.

Well, I am meanwhile pretty sure that the suggestion by the reviewer at https://github.com/openssl/openssl/pull/9774#discussion_r321224326 and https://github.com/openssl/openssl/pull/9774#discussion_r321311665 to use constant-time failure checking was irrelevant, even misleading.

At that point in OSSL_CRMF_ENCRYPTEDVALUE_get1_encCert() an encrypted newly enrolled certificate has been received and is being decrypted. This only happens under the control of the (CMP) client that had requested a certificate, resorting to a very unusual indirect way of doing proof of possession (while usually the key material related to the cert can be used for signing and thus direct signature-based POP can be done instead). Pretty sure this decryption cannot be triggered form outside in a tight loop.

davidben commented 2 years ago

Pretty sure this decryption cannot be triggered form outside in a tight loop.

I don't think that's a good criteria for skipping side channel protections. Moreover, the RSAES-PKCS1-v1_5 flaw doesn't even require timing to exploit. Merely being able to probe query ciphertexts for a private key is enough to leak information. Someone who understands both the CRMF protocol and the relevant leaks and attacks here should probably spend more time on this code. (Alas I'm a bit hosed and probably won't have time to figure out enough of CRMF to sort this out. :-( )

DDvO commented 2 years ago

I don't think that's a good criteria for skipping side channel protections. Moreover, the RSAES-PKCS1-v1_5 flaw doesn't even require timing to exploit.

I do not know whether this flaw was the motivation to suggest a constant-time result check - @bernd-edlinger can you please clarify what the motivation was?

CRMF is not so much a protocol, but stands for Certificate Request Message Format, which is used mostly by the Certificate Management Protocol (CMP). It is a more flexible but much less known alternative to PKCS#10.

bernd-edlinger commented 2 years ago

You decrypt something that was received from an untrusted source, You expect PKCS1 padding and some additional checks which have a certain probability to be satisfied by a randomly chosen ciphertext, right? You do not authenticate the ciphertext before it goes to the EVP_PKEY_decrypt function, right?

bernd-edlinger commented 2 years ago

what is constant time is only this sequence ERR_raise(no-secret-values-here); err_clear_last_constant_time(secret-value); ERR_clear_error();

I think also this would work ERR_set_mark(); ERR_raise(no-secret-values-here); err_clear_last_constant_time(secret-value); ERR_pop_to_mark();

DDvO commented 2 years ago

You decrypt something that was received from an untrusted source, You expect PKCS1 padding and some additional checks which have a certain probability to be satisfied by a randomly chosen ciphertext, right? You do not authenticate the ciphertext before it goes to the EVP_PKEY_decrypt function, right?

Thanks for making clear what the concern is, or could be. Given this information, I am pretty confident that it does not apply, simply because the encrypted cert/stuff comes from a CMP server in a response message that has been authenticated. (Well, if the client has been misconfigured such that not proper authentication is done then it is not the fault of the CMP+CRMF lib).

bernd-edlinger commented 2 years ago

Yes, when the message containing the cipher text is authenticated with a signature from a trusted CMP server you don't need that. Which signature algorithm is it?

DDvO commented 2 years ago

In CMP the message (+ sender) authentication is flexible. It may be MAC-based with a shared secret (typically using the PBM algorithm specified in https://datatracker.ietf.org/doc/html/rfc4210#section-5.1.3.1), otherwise (which is the normal situation) it is signature-based with the algorithm given in the signer cert, which of course is accepted only if it can be traced to a trust anchor, which indirectly determines via the CA policies the (minimal) strength of the signature.

DDvO commented 2 years ago

So given that the CMP client does not offer an interface for chosen ciphertext attacks with unauthenticated input from an unknown party, can we conclude that we do not need to take special care in terms of constant-time behavior when checking the results of decrypting parts of an authenticated message?

bernd-edlinger commented 2 years ago

that's right, and there is no need to clear the error stack in that case either.

DDvO commented 2 years ago

Thanks @bernd-edlinger for this clarification. So we can safely remove from crmf_lib.c the constant-time way of checking the decryption result because at that point it is just misleading, and we are free to handle the error queue in the way it fits best from other perspectives.

DDvO commented 2 years ago

Now since this is clarified, we may return in this thread to the original question, namely whether ERR_clear_error() really is constant-time - or has it turned out that the alleged property is not relevant for this function?

bernd-edlinger commented 2 years ago

namely whether ERR_clear_error() really is constant-time

in general not.

It is only constant time, immediately after the function err_clear_last_constant_time and only with respect of whether this last entry was cleared or not.

DDvO commented 2 years ago

Interesting to be pointed by your responses to err_clear_last_constant_time(), which I was not aware of. BTW, there is no internal documentation for any of the (internal) *constant_time* functions (besides code comments for part of them).

Looking at the implementation of err_clear_last_constant_time() I'd say it is constant-time modulo any still due initialization actions by ossl_err_get_state_int(), which likely do not depend on the error state.

Yet I do not believe that ERR_clear_error() is constant-time, regardless whether err_clear_last_constant_time() was called before, simply because (as mentioned above) it frees a couple of strings per error queue entry.

bernd-edlinger commented 2 years ago

The term constant-time means something different in cryptography. The point is there is no correlation between the time that ERR_clear_error() takes and the secret data value that has been passed to err_clear_last_constant_time(). The secret is only revealed if you call ERR_get_error() or similar.

DDvO commented 2 years ago

Well, the only data explicitly passed to err_clear_last_constant_time() is a flag whether to clear the last entry, and agreed it is constant-time w.r.t. this input. Surprising that here the input flag is considered secret, but this makes sense to me because it may express whether a decryption was successful or not.

In the sense I understand your explanation, every function that has no parameter, in particular ERR_clear_error() is trivially constant-time, right?

And this would be the case regardless in which situation it is called, in particular, independent of whether err_clear_last_constant_time() was called immediately before or not, while you wrote earlier that this matters here, which seems contradictory.

So, the original question seems to boil down to this: Can implicit parameters (i.e., statically accessible values) defeat the constant-time property?

davidben commented 2 years ago

In the sense I understand your explanation, every function that has no parameter, in particular ERR_clear_error() is trivially constant-time, right?

No, secrets may be in globals. Attackers, of course, do not care whether a leakage happened to come from a global variable or parameter. Indeed the error system is a pile of globals, and part of the issue here seems to be whether there was a secret in the error queue state.

Personally, I think having secret error queue state is too fragile and we should instead not put them in there in the first place. But for RSAES-PKCS1-v1_5, that means different APIs and a change to CRMF's code to adapt to whatever that API is. Indeed, without having looked carefully at CRMF, I worry you may be missing the forest for the trees. The reason for all this timing-sensitive stuff around error handling is because the encryption primitive used by CRMF, RSAES-PKCS1-v1_5, is plain broken. See the Bleichenbacher attack. All the timing sensitive stuff we do around errors is workarounds to mitigate it for certain protocols. https://datatracker.ietf.org/doc/html/rfc5246#section-7.4.7.1 describes what TLS does.

This works in TLS because it effectively combines the RSA padding check with the Finished hash that comes right after it. But it's ultimately a workaround and makes some assumptions around how the key is used. The real answer is to retire the scheme altogether, as TLS 1.3 did when it removed RSA-encryption-based cipher suites altogether.

How all that translates to CRMF, I'm not sure. I'm not familiar with it and don't have the time to dig into it carefully right now.

DDvO commented 2 years ago

I do not understand the details of RSAES-PKCS1-v1_5 and honestly do not want to know about them. As mentioned before, for CMP and CRMF its use is very rare and depends on the type of key material being used. Nevertheless, for the case it is chosen, the implementation should of course be on the safe side.

Comparing the code in crmf_lib.c with at other calls of EVP_PKEY_decrypt() in crypto/, I found three things:

The usage patters are very similar, see pkcs7/pk7_doit.c, evp/p_open.c, and cms/cms_env.c, e.g.:

if (EVP_PKEY_decrypt_init(pctx) <= 0)
    goto err;

if (EVP_PKEY_decrypt(pctx, NULL, &eklen,
                     ri->enc_key->data, ri->enc_key->length) <= 0)
    goto err;

ek = OPENSSL_malloc(eklen);

if (ek == NULL) {
    ERR_raise(ERR_LIB_PKCS7, ERR_R_MALLOC_FAILURE);
    goto err;
}

if (EVP_PKEY_decrypt(pctx, ek, &eklen,
                     ri->enc_key->data, ri->enc_key->length) <= 0
        || eklen == 0
        || (fixlen != 0 && eklen != fixlen)) {
    ret = 0;
    ERR_raise(ERR_LIB_PKCS7, ERR_R_EVP_LIB);
    goto err;
}

None of these care to use any constant-time error checking
None of them bother with potentially critical error queue entries.

My conclusions form these observations are:

The common use of EVP_PKEY_decrypt{,_init}() should be factored out in a helper function to be used in all such places.
Either those non-CRMF uses are unsafe w.r.t. RSAES-PKCS1-v1_5 or all the concerns raised about the CRMF implementation were misleading.

bernd-edlinger commented 2 years ago

evp/p_open.c just assumes that the caller does mitigation against a possible Bleichenbacher Attack. There is not much that can be done here, One would need to look at callers if they do things like signature checking before calling this function.

The other two places are different. There are already some counter measures against Bleichenbacher Attack, or MMA Attack, (million message attack) which you find in the comments here: https://github.com/openssl/openssl/blob/2080da84a49b0c52fc8c6e6caef5d373235bd3e4/crypto/pkcs7/pk7_doit.c#L593-L601

It's called MMA in some comments, because the attacker needs the padding oracle to decrypt around a million messages in order to forge one signature, or decrypt one cipher text without ever learning the private key. So this is even more than a (compromised) CA could do, a CA could easily create a fake certificate and sign something with this certificate, but it could not decrypt any cipher text.

So yes the other places pkcs7/pk7_doit.c and cms/cms_env.c are unsafe, and they could be improved. They are in it's current form since 5840ed0cd1e6487d247efbc1a04136a41d7b3a37 where I improved the countermeasures to require 100 times more messages than previously by additionally checking the expected length of the plain text. Well I have not yet made the function EVP_PKEY_decrypt constant-time at that time. So today they could be formally constant time, but that should be done for the whole process. And if the condition fails, a deterministic default- key should be used instead of a random one. That should be based on something that the attacker can't possibly know, for instance SHA256(CIPHER_TEXT|PRIVATE_KEY). so it is not possible that the attacker can see a different behaviour when she sends the same CIPHER_TEXT again, yet she can't predict what exactly will be decoded. The cms part shoud also use a default key at least when the RSAES-PKCS1-v1_5 padding is used.

DDvO commented 2 years ago

Thanks a lot @bernd-edlinger for your clarifying further response. According to the evolution of this discussion, I changed the title of this issue and promoted it to a (minor?) security issue/bug.

DDvO commented 2 years ago

How about factoring out from evp/p_open.c, crmf/crmf_lib.c, cms/cms_env.c, and pkcs7/pk7_doit.c the code sections calling EVP_PKEY_decrypt{,_init}() (and possibly also the EVP_PKEY_CTX handling) into a common helper function?

DDvO commented 2 years ago

How to proceed here?

DDvO commented 2 years ago

@bernd-edlinger, can you provide a concrete solution here, ideally a PR?

DDvO commented 1 year ago

This has been open since 1.5 years. @bernd-edlinger, how do you suggest to proceed?

DDvO commented 1 year ago

How about factoring out from evp/p_open.c, crmf/crmf_lib.c, cms/cms_env.c, and pkcs7/pk7_doit.c the code sections calling EVP_PKEY_decrypt{,_init}() (and possibly also the EVP_PKEY_CTX handling) into a common helper function?

This is meanwhile done via #17354. So any improvements in this topic can now be done at one place, namely in evp_pkey_decrypt_alloc() in crypto/evp/asymcipher.c.

openssl / openssl

Use of `EVP_PKEY_decrypt()` should be more secure #17319