python / cpython

The Python programming language
https://www.python.org
Other
62.37k stars 29.95k forks source link

Replace built-in hashlib with verified implementations from HACL* #99108

Open msprotz opened 1 year ago

msprotz commented 1 year ago

Feature or enhancement

We propose to replace the non-OpenSSL cryptographic primitives in hashlib with high-assurance, verified versions from the HACL* project.

Pitch

As evidenced by the recent SHA3 buffer overflow, cryptographic primitives are tricky to implement correctly. There might be issues with memory management, exceeding lengths, incorrect buffer management, or worse, incorrect implementations in corner cases.

The HACL* project https://github.com/hacl-star/hacl-star provides verified implementations of cryptographic primitives. These implementations are mathematically shown to be:

See https://hacl-star.github.io/Overview.html#what-is-verified-software for a longer description of how formal methods can help write high-assurance software and rule out entire classes of bugs.

The performance of HACL is competitive with, and sometimes exceeds, that of OpenSSL. HACL is distributed as pure C, and therefore is portable. Parts of HACL* have been adopted in Mozilla, Linux, the Tezos blockchain, and many more, thereby demonstrating that formally verified code is ready for production-time.

Previous discussion

Tagging @alex with whom I've informally discussed this.

gpshead commented 1 year ago

(missing context: there was a private email thread cc'ing ~7 of us including myself, @tiran, @alex, @msprotz & others that led to this issue and PR) in response to https://github.com/python/cpython/issues/98517 CVE happening with our old XKCP sha3 vendored code.

One reason I might support this work across all of our required always supported hashes is that if these perform well, I'd be happy to see the _hashlib module wrapping OpenSSL deprecated entirely. (Others probably disagree with that, so we may never get there). One less thing depending on OpenSSL sounds nice - but the OpenSSL EVP interface and these algorithms are far less likely to be where OpenSSL CVE bugs lurk (famous last words). BUT... if the vendored third party code we use as a fallback for hash algorithms can be replaced with something nearly guaranteed not to be the next CVE as the hack-star project appears to aim to be (I cannot actually be the judge of that), and be high performance... Why should we bother involving OpenSSL for these at all?

If this sounds like the opposite of my statements over buried in the blake3 rejection thread - it's because I did not seek an alternative to the variety of vendored fallback hash algorithm implementations we carry there. So the goal in those statements was to lean towards simplicity instead of complexity. (which has already paid off via our XKCP removal in favor of tiny_sha3 by not having Python 3.11+ suffer from the sha3 XKCP CVE that the larger complex third party code in 3.6-3.10 did)

We provide and ship native implementations for the hash algorithms anyways as we support builds without OpenSSL present, and support building on systems with libssl versions or variants that do not provide everything.

Q: Would adopting this code increase our maintenance burden?

A: Probably, yes. We'd presumably want to update it with more recent versions from hacl-star from time to time. Though the theory is that there should never be a security need to do so? In the past we've been asked to update vendored hash algorithm code from time to time but have hesitated due to the complexity and perceived risk, and assumed lack of reward given people who care about performance would presumably have a modern OpenSSL to get that from. Though if we did get rid of the OpenSSL wrapper code that'd also be a reduced burden.

Q: What about performance?

A: Good question. That needs continual measuring on our Tiered supported platforms. Don't focus solely on thruput. OpenSSL's EVP APIs are known to have silly-slow setup time. So something hashing a pile of 234 byte objects may be slow no matter how good OpenSSL's bulk data thruput is. Benchmark small message thruput as well as bulk data thruput.

Parting thought: There is a broader desire to wean us off of our OpenSSL dependency. I realize that's a hard thing so long as the _ssl module exists, but it is a long term desire of many. This kind of change would be one necessary step, regardless of when it is taken.

msprotz commented 1 year ago

Thanks for your comment, Greg. Some related thoughts.

None of this is infeasible -- it's just a matter of deciding what is good for a Python-in-the-future where hashlib no longer depends on OpenSSL. For instance, you might say "let's maintain a portable C version, and one single ASM version that uses SHAEXT when available (fastest)", and this is something we could make happen within HACL*.

Just to give you a sense of what we have right now, allow me to go into greater detail:

Hope this helps. As you said, this kind of change is a first step in the right direction.

gpshead commented 1 year ago

Alright, the first PR is in. Checkbox time:

msprotz commented 1 year ago

Great! sha384/sha512 is a no-brainer and I'll send a followup PR very soon. We have SHA3 as well, so I can send that in, too, with the caveat that we have a few performance optimizations for sha3 in the works, so we can either wait for those to land in HACL then submit the Python PR, or first land SHA3 in Python, then send a followup PR to refresh the HACL code once the performance tweaks have landed. Happy to hear your thoughts on this.

gpshead commented 1 year ago

Given that the SHA3 implementation we currently ship is a tiny poor performing one, no need to wait. I expect we'll pull in updates to the HACL* implementations as needed in the future when there is a motivating reason.

msprotz commented 1 year ago

@gpshead I'll let you check MD5/SHA1 in the checklist above... quick question: what is the timeline for landing sha3? it requires a little bit of work on my side and I'll be traveling all March, so ideally I would submit the SHA3 PR in April. But if it's a dealbreaker because of e.g. the Python release schedule, I can scramble to try to submit it before then. Let me know.

gpshead commented 1 year ago

So long as we can get it committed before 3.12beta1 at the beginning of May (see https://peps.python.org/pep-0693/) we're good. So April for that is fine by me. No need to scramble.

msprotz commented 1 year ago

With #103597, there is only one algorithm left in hashlib that is not verified: Blake2.

I took a quick glance at the code, and I have a few high-level points to discuss. Would love to have feedback on these, below. Thanks!

Performance

HACL_blake2s_32_streaming                  3758 ns         3755 ns       186711
HACL_blake2s_vec128_streaming              3265 ns         3251 ns       205682
OpenSSL_hash_streaming/blake2s256          4558 ns         4550 ns       136367

This is a benchmark of HACL's "streaming" API (the one that we use here, with internal buffering) vs. the same flavor of API from OpenSSL. Lower is better. I don't have yet a comparison against official libb2 / reference implementation but will update this once I have it. In any case, it looks like the performance is good enough to alleviate any concerns. Let me know if this isn't the case.

system libb2

It looks like there are three possible implementations for blake2. Vendored reference implementation, libb2 implementation if found on the system, and openssl if hashlib is configured to use openssl. Is that correct?

Why bother with a system libb2? Is there a strong reason? If the HACL implementation is competitive with OpenSSL (per benchmark above), then is it ok to remove support for libb2? (Looking to simplify things here!)

CPU feature detection

Blake2 has regular (portable) versions, and optimized versions, which require support for a vectorized instruction set (SSSE3, SSE4.1, AVX, or XOP). The test that determines whether to use these versions is:

#if defined(__SSSE3__) || defined(__SSE4_1__) || defined(__AVX__) || defined(__XOP__)

I can see a few potential issues, and I would like to know how much the Python team cares about those.

Since this doesn't appear to have been a problem so far, my hunch is that the optimized version is almost never selected, because most compilers are conservative in their choice of minimum target architecture (e.g. i686 and x86_64 are the respective baselines, but nothing more than that). This means that unless you are compiling with Apple clang, or use custom CFLAGS, the version you're picking up is almost certainly the baseline version with no vectorized optimizations.

HACL has baseline and* vectorized implementations. But if we wanted to keep it simple, I think it would be ok to ship just the regular version as part of hashlib. Let me know what you think.

gpshead commented 1 year ago

blake2:

3.11 shipped with the ability to link against libb2 via https://github.com/python/cpython/issues/91251, some OS distro packagers presumably like that and libb2 should be considered better than our old vendored copy of that. I believe it might use proper runtime feature detection?

OpenSSL doesn't support enough blake2 to provide all of the APIs hashlib supports for blake2, so libb2 or our older vendored copy of that is always used for hashlib.blake2b & blake2s functions regardless. So OpenSSL isn't really in the picture here. So we're down to two implementations, not three.

I agree that the optimized version selection logic at compile time is the wrong approach. Useful on distros like Gentoo (which I believe is where the PR author was coming from) as people tend to build the entire system with recent-arch compiler flags there. But you are right that it probably doesn't do what was intended for a lot of users and just uses the C reference version. That was https://github.com/python/cpython/pull/4066. It'll be good to see that go away. :)

If we do a HACL implementation of blake2, it should replace our vendored blake2 (C and optimized version) code.

Performance comparison numbers of that vs linking against libb2 as shipped on a common debian or fedora-like distro can show if there is sufficient reason to keep the libb2 option at all. 100% agreed that getting rid of libb2 support in the presence of a HACL vendored version may be simpler code to maintain (desirable). We'd be down to a single version at that point!

msprotz commented 1 year ago

Thanks, that's useful context!

My suggestion for Blake2 is to do it in two phases.

I'll come back here shortly with some benchmarks.

gpshead commented 1 year ago

Regarding https://github.com/python/cpython/blob/main/Modules/_hashopenssl.c#L133 the _hashlib blake2b512 and blakes2s256 OpenSSL functions exist and can be called via the hashlib.new() public API using those as string algorithm names, but they aren't documented and should be considered versions not guaranteed to be present in a build. I'd be surprised if anyone actually ever used them - but we wouldn't bother to remove them unless we were at the point of getting rid of the OpenSSL backed hashlib entirely.

msprotz commented 1 year ago

The benchmarks for HACL vs Blake2 reference implementation are still on a branch, but I managed to run them. Here are the relevant lines:

BLAKE2_blake2b_ref_streaming               2450 ns         2449 ns       276237
BLAKE2_blake2s_ref_streaming               2990 ns         2989 ns       226532
HACL_blake2b_32_streaming                  2746 ns         2744 ns       249755
HACL_blake2s_32_streaming                  3557 ns         3555 ns       191316

This is assuming that Blake2 reference implementations are not running optimized (which, per our discussion above, is likely the reality for most users).

The short version is that HACL is about 10% slower than the reference implementation. But if most users run with libb2 anyhow, and the goal is to streamline the crypto in hashlib, then I'm happy to submit a subsequent PR for Blake2. Let me know.

(In the meanwhile we'll try to take a look at the performance gap.)

gpshead commented 1 year ago

Sounds good, go ahead and work up a PR for blake2 HACL when you've got a chance.

Don't fret about the 3.12beta1 feature freeze cutoff for blake2 - while our release manager has technically delayed that to two weeks from today, it isn't a big deal if a blake2 built-in update misses 3.12.

msprotz commented 1 year ago

Quick update: I took a look at our blake2 code and it doesn't expose enough of the parameterized API which Python currently offers for tree hashing. It's not a big deal, but means there's enough work on our end that Blake2 won't make it into 3.12 but rather into the next release... I'll post updates here once we make enough progress upstream.

encukou commented 1 month ago

The PR broke several stable x86_64-unknown-linux-gnu RHEL7 buildbots with:

../Modules/_hacl/Lib_Memzero0.c:43:5: error: implicit declaration of function ‘explicit_bzero’ [-Werror=implicit-function-declaration]
     explicit_bzero(dst, len_);
     ^~~~~~~~~~~~~~

By one reading of PEP-11, this needs to be fixed or reverted in 24 hours. Is that possible?

Note that while RHEL7 itself is past its end-of-maintenance phase, and breaking it doesn't really matter, it serves as a “canary” for platforms that don't support the non-standard extension -- Android being most prominent in this case, but also smaller systems without the resources to maintain buildbots. IMO, CPython should stick to C & POSIX standards, with anything extra behind a feature check.

mhsmith commented 1 month ago

This is partly my fault for not setting up the Android buildbot soon enough, so I'll submit a fix today.

This probably won't affect any other platform, since it's already guarded by a __linux__ check, and explicit_bzero is apparently available in current versions of both glibc and musl. But Android has a Linux kernel with a BSD-based libc.

As for RHEL7, it looks like the logical thing to do is to disable its buildbot on the main branch, but keep it running on 3.13 and older.

gpshead commented 1 month ago

Thanks! Also, sorry, I forgot about the RHEL7 canary side of things when merging. (Too many things being chased at once). The fix should just be an autoconf dance.

On Wed, Aug 14, 2024, 5:55 AM Malcolm Smith @.***> wrote:

This is partly my fault for not setting up the Android buildbot soon enough, so I'll submit a fix today.

This probably won't affect any other platform, since it's already guarded by a linux check, and explicit_bzero is apparently available in current versions of both glibc and musl. But Android has a Linux kernel with a BSD-based libc.

As for RHEL7, it looks like the logical thing to do is to disable its buildbot on the main branch, but keep it running on 3.13 and older.

— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/99108#issuecomment-2288666002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQXCZ3BFDIGDHUMVLW5CDZRNHVZAVCNFSM6AAAAAARXSQKPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBYGY3DMMBQGI . You are receiving this because you were mentioned.Message ID: @.***>

msprotz commented 1 month ago

This is partly my fault for not setting up the Android buildbot soon enough, so I'll submit a fix today.

Do I understand correctly that you'll try to submit an autoconf fix? In that case, I would add an autoconf check here: https://github.com/python/cpython/blob/main/configure.ac#L7777-L7784 to determine whether explicit_bzero is unavailable; if it is unavailable, define LIBHACL_MEMZERO_FLAGS=-DLINUX_NO_EXPLICIT_BZERO, then AC_SUBST the flag, then add the flag here: https://github.com/python/cpython/blob/main/Makefile.pre.in#L1393

Hope this helps!

mhsmith commented 1 month ago

Thanks, yes, that's what I'm planning to do.