Open msprotz opened 1 year ago
(missing context: there was a private email thread cc'ing ~7 of us including myself, @tiran, @alex, @msprotz & others that led to this issue and PR) in response to https://github.com/python/cpython/issues/98517 CVE happening with our old XKCP sha3 vendored code.
One reason I might support this work across all of our required always supported hashes is that if these perform well, I'd be happy to see the _hashlib
module wrapping OpenSSL deprecated entirely. (Others probably disagree with that, so we may never get there). One less thing depending on OpenSSL sounds nice - but the OpenSSL EVP interface and these algorithms are far less likely to be where OpenSSL CVE bugs lurk (famous last words). BUT... if the vendored third party code we use as a fallback for hash algorithms can be replaced with something nearly guaranteed not to be the next CVE as the hack-star project appears to aim to be (I cannot actually be the judge of that), and be high performance... Why should we bother involving OpenSSL for these at all?
If this sounds like the opposite of my statements over buried in the blake3 rejection thread - it's because I did not seek an alternative to the variety of vendored fallback hash algorithm implementations we carry there. So the goal in those statements was to lean towards simplicity instead of complexity. (which has already paid off via our XKCP removal in favor of tiny_sha3 by not having Python 3.11+ suffer from the sha3 XKCP CVE that the larger complex third party code in 3.6-3.10 did)
We provide and ship native implementations for the hash algorithms anyways as we support builds without OpenSSL present, and support building on systems with libssl versions or variants that do not provide everything.
Q: Would adopting this code increase our maintenance burden?
A: Probably, yes. We'd presumably want to update it with more recent versions from hacl-star from time to time. Though the theory is that there should never be a security need to do so? In the past we've been asked to update vendored hash algorithm code from time to time but have hesitated due to the complexity and perceived risk, and assumed lack of reward given people who care about performance would presumably have a modern OpenSSL to get that from. Though if we did get rid of the OpenSSL wrapper code that'd also be a reduced burden.
Q: What about performance?
A: Good question. That needs continual measuring on our Tiered supported platforms. Don't focus solely on thruput. OpenSSL's EVP APIs are known to have silly-slow setup time. So something hashing a pile of 234 byte objects may be slow no matter how good OpenSSL's bulk data thruput is. Benchmark small message thruput as well as bulk data thruput.
Parting thought: There is a broader desire to wean us off of our OpenSSL dependency. I realize that's a hard thing so long as the _ssl
module exists, but it is a long term desire of many. This kind of change would be one necessary step, regardless of when it is taken.
Thanks for your comment, Greg. Some related thoughts.
None of this is infeasible -- it's just a matter of deciding what is good for a Python-in-the-future where hashlib no longer depends on OpenSSL. For instance, you might say "let's maintain a portable C version, and one single ASM version that uses SHAEXT when available (fastest)", and this is something we could make happen within HACL*.
Just to give you a sense of what we have right now, allow me to go into greater detail:
Hope this helps. As you said, this kind of change is a first step in the right direction.
Alright, the first PR is in. Checkbox time:
Great! sha384/sha512 is a no-brainer and I'll send a followup PR very soon. We have SHA3 as well, so I can send that in, too, with the caveat that we have a few performance optimizations for sha3 in the works, so we can either wait for those to land in HACL then submit the Python PR, or first land SHA3 in Python, then send a followup PR to refresh the HACL code once the performance tweaks have landed. Happy to hear your thoughts on this.
Given that the SHA3 implementation we currently ship is a tiny poor performing one, no need to wait. I expect we'll pull in updates to the HACL* implementations as needed in the future when there is a motivating reason.
@gpshead I'll let you check MD5/SHA1 in the checklist above... quick question: what is the timeline for landing sha3? it requires a little bit of work on my side and I'll be traveling all March, so ideally I would submit the SHA3 PR in April. But if it's a dealbreaker because of e.g. the Python release schedule, I can scramble to try to submit it before then. Let me know.
So long as we can get it committed before 3.12beta1 at the beginning of May (see https://peps.python.org/pep-0693/) we're good. So April for that is fine by me. No need to scramble.
With #103597, there is only one algorithm left in hashlib that is not verified: Blake2.
I took a quick glance at the code, and I have a few high-level points to discuss. Would love to have feedback on these, below. Thanks!
HACL_blake2s_32_streaming 3758 ns 3755 ns 186711
HACL_blake2s_vec128_streaming 3265 ns 3251 ns 205682
OpenSSL_hash_streaming/blake2s256 4558 ns 4550 ns 136367
This is a benchmark of HACL's "streaming" API (the one that we use here, with internal buffering) vs. the same flavor of API from OpenSSL. Lower is better. I don't have yet a comparison against official libb2 / reference implementation but will update this once I have it. In any case, it looks like the performance is good enough to alleviate any concerns. Let me know if this isn't the case.
It looks like there are three possible implementations for blake2. Vendored reference implementation, libb2 implementation if found on the system, and openssl if hashlib is configured to use openssl. Is that correct?
Why bother with a system libb2? Is there a strong reason? If the HACL implementation is competitive with OpenSSL (per benchmark above), then is it ok to remove support for libb2? (Looking to simplify things here!)
Blake2 has regular (portable) versions, and optimized versions, which require support for a vectorized instruction set (SSSE3, SSE4.1, AVX, or XOP). The test that determines whether to use these versions is:
#if defined(__SSSE3__) || defined(__SSE4_1__) || defined(__AVX__) || defined(__XOP__)
I can see a few potential issues, and I would like to know how much the Python team cares about those.
-march
, determined at compiler configure-time). On my OSX machine, the test above is true with Apple clang, but false with gcc-12. (GCC was configured to assume x86_64 as its baseline target, which has no support for vector instructions. Clang probably was configured by Apple to assume the first two flags, since I presume no Apple device exists without SSSE3 nor SSE4.1.) Since (as far as I can tell) the Python build system doesn't use any particular -march flag, the choice of optimization level for Blake2 depends on the choice of compiler. (It shouldn't.)-march=native
and enables the optimized version, but produces a binary that might not execute successfully on all systems.Since this doesn't appear to have been a problem so far, my hunch is that the optimized version is almost never selected, because most compilers are conservative in their choice of minimum target architecture (e.g. i686 and x86_64 are the respective baselines, but nothing more than that). This means that unless you are compiling with Apple clang, or use custom CFLAGS, the version you're picking up is almost certainly the baseline version with no vectorized optimizations.
HACL has baseline and* vectorized implementations. But if we wanted to keep it simple, I think it would be ok to ship just the regular version as part of hashlib. Let me know what you think.
blake2:
3.11 shipped with the ability to link against libb2 via https://github.com/python/cpython/issues/91251, some OS distro packagers presumably like that and libb2 should be considered better than our old vendored copy of that. I believe it might use proper runtime feature detection?
OpenSSL doesn't support enough blake2 to provide all of the APIs hashlib supports for blake2, so libb2 or our older vendored copy of that is always used for hashlib.blake2b
& blake2s
functions regardless. So OpenSSL isn't really in the picture here. So we're down to two implementations, not three.
I agree that the optimized version selection logic at compile time is the wrong approach. Useful on distros like Gentoo (which I believe is where the PR author was coming from) as people tend to build the entire system with recent-arch compiler flags there. But you are right that it probably doesn't do what was intended for a lot of users and just uses the C reference version. That was https://github.com/python/cpython/pull/4066. It'll be good to see that go away. :)
If we do a HACL implementation of blake2, it should replace our vendored blake2 (C and optimized version) code.
Performance comparison numbers of that vs linking against libb2 as shipped on a common debian or fedora-like distro can show if there is sufficient reason to keep the libb2 option at all. 100% agreed that getting rid of libb2 support in the presence of a HACL vendored version may be simpler code to maintain (desirable). We'd be down to a single version at that point!
Thanks, that's useful context!
My suggestion for Blake2 is to do it in two phases.
I'll come back here shortly with some benchmarks.
Regarding https://github.com/python/cpython/blob/main/Modules/_hashopenssl.c#L133 the _hashlib
blake2b512
and blakes2s256
OpenSSL functions exist and can be called via the hashlib.new()
public API using those as string algorithm names, but they aren't documented and should be considered versions not guaranteed to be present in a build. I'd be surprised if anyone actually ever used them - but we wouldn't bother to remove them unless we were at the point of getting rid of the OpenSSL backed hashlib entirely.
The benchmarks for HACL vs Blake2 reference implementation are still on a branch, but I managed to run them. Here are the relevant lines:
BLAKE2_blake2b_ref_streaming 2450 ns 2449 ns 276237
BLAKE2_blake2s_ref_streaming 2990 ns 2989 ns 226532
HACL_blake2b_32_streaming 2746 ns 2744 ns 249755
HACL_blake2s_32_streaming 3557 ns 3555 ns 191316
This is assuming that Blake2 reference implementations are not running optimized (which, per our discussion above, is likely the reality for most users).
The short version is that HACL is about 10% slower than the reference implementation. But if most users run with libb2 anyhow, and the goal is to streamline the crypto in hashlib, then I'm happy to submit a subsequent PR for Blake2. Let me know.
(In the meanwhile we'll try to take a look at the performance gap.)
Sounds good, go ahead and work up a PR for blake2 HACL when you've got a chance.
Don't fret about the 3.12beta1 feature freeze cutoff for blake2 - while our release manager has technically delayed that to two weeks from today, it isn't a big deal if a blake2 built-in update misses 3.12.
Quick update: I took a look at our blake2 code and it doesn't expose enough of the parameterized API which Python currently offers for tree hashing. It's not a big deal, but means there's enough work on our end that Blake2 won't make it into 3.12 but rather into the next release... I'll post updates here once we make enough progress upstream.
The PR broke several stable x86_64-unknown-linux-gnu
RHEL7 buildbots with:
../Modules/_hacl/Lib_Memzero0.c:43:5: error: implicit declaration of function ‘explicit_bzero’ [-Werror=implicit-function-declaration]
explicit_bzero(dst, len_);
^~~~~~~~~~~~~~
By one reading of PEP-11, this needs to be fixed or reverted in 24 hours. Is that possible?
Note that while RHEL7 itself is past its end-of-maintenance phase, and breaking it doesn't really matter, it serves as a “canary” for platforms that don't support the non-standard extension -- Android being most prominent in this case, but also smaller systems without the resources to maintain buildbots. IMO, CPython should stick to C & POSIX standards, with anything extra behind a feature check.
This is partly my fault for not setting up the Android buildbot soon enough, so I'll submit a fix today.
This probably won't affect any other platform, since it's already guarded by a __linux__
check, and explicit_bzero
is apparently available in current versions of both glibc and musl. But Android has a Linux kernel with a BSD-based libc.
As for RHEL7, it looks like the logical thing to do is to disable its buildbot on the main branch, but keep it running on 3.13 and older.
Thanks! Also, sorry, I forgot about the RHEL7 canary side of things when merging. (Too many things being chased at once). The fix should just be an autoconf dance.
On Wed, Aug 14, 2024, 5:55 AM Malcolm Smith @.***> wrote:
This is partly my fault for not setting up the Android buildbot soon enough, so I'll submit a fix today.
This probably won't affect any other platform, since it's already guarded by a linux check, and explicit_bzero is apparently available in current versions of both glibc and musl. But Android has a Linux kernel with a BSD-based libc.
As for RHEL7, it looks like the logical thing to do is to disable its buildbot on the main branch, but keep it running on 3.13 and older.
— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/99108#issuecomment-2288666002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQXCZ3BFDIGDHUMVLW5CDZRNHVZAVCNFSM6AAAAAARXSQKPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBYGY3DMMBQGI . You are receiving this because you were mentioned.Message ID: @.***>
This is partly my fault for not setting up the Android buildbot soon enough, so I'll submit a fix today.
Do I understand correctly that you'll try to submit an autoconf fix? In that case, I would add an autoconf check here: https://github.com/python/cpython/blob/main/configure.ac#L7777-L7784 to determine whether explicit_bzero is unavailable; if it is unavailable, define LIBHACL_MEMZERO_FLAGS=-DLINUX_NO_EXPLICIT_BZERO, then AC_SUBST the flag, then add the flag here: https://github.com/python/cpython/blob/main/Makefile.pre.in#L1393
Hope this helps!
Thanks, yes, that's what I'm planning to do.
Feature or enhancement
We propose to replace the non-OpenSSL cryptographic primitives in hashlib with high-assurance, verified versions from the HACL* project.
Pitch
As evidenced by the recent SHA3 buffer overflow, cryptographic primitives are tricky to implement correctly. There might be issues with memory management, exceeding lengths, incorrect buffer management, or worse, incorrect implementations in corner cases.
The HACL* project https://github.com/hacl-star/hacl-star provides verified implementations of cryptographic primitives. These implementations are mathematically shown to be:
See https://hacl-star.github.io/Overview.html#what-is-verified-software for a longer description of how formal methods can help write high-assurance software and rule out entire classes of bugs.
The performance of HACL is competitive with, and sometimes exceeds, that of OpenSSL. HACL is distributed as pure C, and therefore is portable. Parts of HACL* have been adopted in Mozilla, Linux, the Tezos blockchain, and many more, thereby demonstrating that formally verified code is ready for production-time.
Previous discussion
Tagging @alex with whom I've informally discussed this.