BIGNUM code is not constant-time due to bn_correct_top

This is a write-up of some work we did recently in BoringSSL. It addresses a long-standing timing leak in BIGNUM. Unfortunately, the issue is in a core BIGNUM invariant, so it won't be a small fix. But I thought it'd be good to have a write-up available for when folks are ready to tackle it.

Most of OpenSSL's asymmetric crypto code is not constant-time because, among other things, bn_correct_top chops off leading zeros. This is to satisfy an internal invariant where all BIGNUMs are minimal representation.

This invariant makes sense for calculators, but it is incompatible with cryptography. While p and q in RSA have public bit widths (half of the bit width of n), d has a secret bit width. It is only publicly bounded by n. Field elements and scalars in EC are only publicly bounded by the field size and group order, respectively. Zero as a field element in P-256 and P-384 should have different representations. Leaking information about the minimal bit width of RSA private key operation result, for example, can give a private key oracle, per Manger's attack. (Although, for RSA keys that are a whole number of words, particularly on 64-bit, the probability of leaking anything is low enough that this is probably mostly theoretical.)

Unfortunately, many BIGNUM functions internally rely on this invariant. E.g. BN_abs_is_word and BN_is_zero here: https://github.com/openssl/openssl/blob/dfee8626a8f6c1e23ab270a6fc20b4d1ba145392/crypto/bn/bn_lib.c#L840-L848

Or this logic from BN_cmp, which assumes a wider number always has larger magnitude. https://github.com/openssl/openssl/blob/dfee8626a8f6c1e23ab270a6fc20b4d1ba145392/crypto/bn/bn_lib.c#L574-L577

Approach

We finally found a successful approach for BoringSSL. We've been running with it for most of the year now and it seems fine.

First off, the invariant needs to go. That's unavoidable, which means that logic like the above needs to be rewritten. We relaxed the invariant slightly so that bn->top, which we renamed to bn->width (see below for why) refers not to the minimal word width, but the public width. This is a public upper bound on the value and also the size of bn->d. If bn->width is 4, the value is less than 2^(4*BN_BITS2). But it may fit in fewer words.

This means there are many different representations for a given value. E.g. any number of zero words means zero.

The calling convention for BIGNUM functions is then:

On input, you must tolerate all representations of each parameter and produce the same numerical result for each. 1 + 1 = 2 no matter what value of 1 you've got. This is for backwards compatibility when existing code sees a non-minimal BIGNUM.
bn->width is always assumed public. (Memory access patterns are public, so this is unavoidable.) Whether the value or minimal width are public depends on the function.
We chose to consider bn->neg also public as negative BIGNUMs never occur if you're already trying to make things constant-time. (Also see the note on negative zeros below.)
On output, you are free to emit whatever representation is appropriate, given your function's API contract. "Calculator" functions usually emit minimal-width values (see BN_mul below) while constant-time functions usually pessimally pick output widths based on input widths.

To expand on on output widths: modular arithmetic like BN_mod_add_quick or BN_mod_exp_mont can size the output by the modulus and satisfy but cryptography and calculators. (Though helpfully doing a reduction or special-casing zero is not great.)

However, calculator and cryptography needs may be incompatible. Consider a BN_mul that treats minimal widths as secret (used in RSA CRT). It must set r->width = a->width + b->width. That's the minimal public bound on the output given the public bound on the input. This is fine for RSA CRT, but consider a "calculator" consumer. Repeatedly multiplying by one would grow memory usage! Making BN_mul suitable for RSA CRT is therefore a breaking change.

Indeed most existing functions must be considered variable-time. Instead, we added an internal bn_mul_consttime that behaves above. BN_mul is then bn_mul_consttime + bn_correct_top. (Separate constant-time and variable-time functions also matches the discussion in #6078.)

There's a second subtlety around negative zero. bn_correct_top today clears bn->neg if the value is zero. But constant-time functions can't call bn_correct_top. Thus our constant-time functions often fail on negative numbers altogether. So BN_mul isn't quite bn_mul_consttime + bn_correct_top, but almost.

The input rule also has an interesting subtlety. BN_mod_add_quick requires that a reduced mod m. But it is still possible for a to be wider than m. m may be minimal-width while a has a ton of leading zeros. This is somewhat a nuisance (EC_FELEM and EC_SCALAR below avoid this). We made many of the primitive constant-time bits "words"-based (like bn_add_words) and made the BIGNUM functions be small wrappers that dealt with the widths mess with helper functions (below).

This is still removing a core BIGNUM invariant, and decisions need to be made on functions to be made constant-time, as well as work to fix all the crypto bits. But it gives an overall convention to things.

Implementation strategy

Should OpenSSL agree with this approach, this is what we did in BoringSSL, which might be applicable.

Add some helpers for dealing with different BIGNUM widths. See bn_fits_in_words, bn_copy_words, bn_minimal_width, and bn_resize_words in BoringSSL.
Go through every existing access of bn->top and fix the function to allow non-minimal BIGNUMs. Write tests for everything. This step may be done incrementally because non-test code will not actually produce non-minimal BIGNUMs yet.
Rename bn->top to bn->width. This is a no-op change. Rather it is an assertion that everything has been fixed. To compile, every access of bn->top will need to change, which means the reviewer can audit each diff site
With step 3 asserting the invariant is no longer necessary, start fixing things. E.g. BN_bin2bn should size by the byte length, not the value. BN_bn2binpad should not call BN_num_bytes. The intermediate BIGNUM operations in RSA should not call bn_correct_top and instead size by modulus.

For the BoringSSL version of some of this work, the following links may be helpful: https://bugs.chromium.org/p/boringssl/issues/detail?id=232 https://bugs.chromium.org/p/boringssl/issues/detail?id=233 https://bugs.chromium.org/p/boringssl/issues/detail?id=234 https://bugs.chromium.org/p/boringssl/issues/detail?id=236 https://boringssl.googlesource.com/boringssl/+/38c20fe8d514a5328f5db404e116af8e9136c7f8

Alternatives considered

As a minor variation, the public BIGNUM upper bound could be measured in bits rather than words, for something finer-grained. But this adds some conversions to get the bounds on bn->d. The only case where it may help is non-standard RSA key sizes, but this isn't worth it. (See the note in 39eeb64f59ff838f976ad305de7d15747d47a41c, which is analogous.)

One could imagine avoiding BIGNUM altogether. Our work on the EC code swapped most of the internals for stack-allocatable EC_FELEM and EC_SCALAR types. This has a number of advantages:

The values are stack-allocatable. EC_FELEM, in particular, gave a considerable perf win for generic curves.
The type system encodes the width and bounds (to a point; it doesn't cover group mismatches, but we're not writing in a dependently-typed language...), giving a bit more safety and avoiding having to worry about the width mismatch issues above.
We don't have the pointless sign bit in there.

However, we ultimately considered this an improvement in addition to the BIGNUM work, rather than a replacement for it. We also retained BIGNUM in the RSA code, at least for now:

RSA values are larger, so rounding up to the largest curve and then stack-allocating is less plausible.
EC only has two "kinds" of values to worry about, with many operations on those values. RSA has several kinds (numbers mod N, P, and Q, and exponents mod N, P, and Q), with fewer operations. The type-checking value is less clear.
Even if internals avoid BIGNUM, enough of the public API transits keys as BIGNUMs that we still need some story. For instance EC_KEY_get0_private_key demands that EC_KEY own a copy of the private key in BIGNUM representation.

Another possibility is to add a "fixed-width" flag to BIGNUMs. However this does not appear to solve things. First, the flags pattern is error-prone and undesirable for speculative execution attacks. See discussion in #6078. Second, consider if that were a fixed-width BIGNUM. What would calling BN_exp on EC_KEY_get0_private_key mean? Or BN_add with overflow? If the semantics change, this is backwards-incompatible. If all of the operations now do something else, there's little point in sharing a type with the original BIGNUMs.

Instead, we consider timing behavior to be a property of the operations one runs. If you call a function that treats the value and minimal width as secret, it won't leak either. If it treats only the value as secret, the minimal width may be leaked. If it treats it all as public, consider the value leaked. It's the function's responsibility to honor its specification w.r.t. secrecy and the caller's responsibility to call functions consistent with their needs.

openssl / openssl