zachmullen / hashstate

C extension fork of cpython's hashlib implementation that also supports serialization of intermediate values
MIT License
0 stars 1 forks source link

Determine which version of OpenSSL to use #2

Open brianhelba opened 3 years ago

brianhelba commented 3 years ago

@zachmullen Perhaps I misunderstood your complaint about the "API changing". I thought you were referring to the Python C API, but I see that the real problem in hashstate is the constant changing of the OpenSSL API. OpenSSL doesn't seem to have consistent release versions, since it seems like almost everyone is pulling or packaging a slightly different version of the code. Their own release notes encourage this, and just list the changes that occurred between various release intervals (which seem to pick version increment levels almost arbitrarily). The very disorganization which facilitated Heartbleed existing in the first place actually seems to have caused the situation to get worse post-Heartbleed, as the frantic uptick in development activity to fix things just made the release process and API surface more irregular.

I'm enjoying this comment by a pretty experienced developer, looking inside the OpenSSL codebase for the first time:

I have come to the conclusion that OpenSSL is equivalent to monkeys throwing feces at the wall. It is, bar none, the worst library I have ever worked with. I can not believe that the internet is running on such a ridiculous complex and gratuitously stupid piece of code.

Python also recognizes the problem of determining which version to use. From PEP 644, which is still in a draft state:

Over time OpenSSL's public API has evolved and changed. Version 1.0.2 introduced new APIs to verify and match hostnames. OpenSSL 1.1.0 made internal structs opaque and introduced new APIs that replace direct access of struct members. Version 3.0.0 will deprecate more APIs due to internal reorganization that moves cryptographic algorithms out of the core and into providers. Forks like LibreSSL and BoringSSL have diverged in different directions.

Currently Python versions 3.6 to 3.9 are compatible with OpenSSL 1.0.2, 1.1.0, and 1.1.1. For the most part Python also works with LibreSSL >= 2.7.1 with some missing features and broken tests.

Due to limited resources and time it becomes increasingly hard to support multiple versions and forks as well as test and verify correctness. Besides multiple incompatible APIs there are build time flags, distribution-specific patches, and local crypto-policy settings that add to plethora of combinations.

If accepted, PEP 644 states:

This PEP proposes for CPython’s standard library to support only OpenSSL 1.1.1 LTS or newer. Support for OpenSSL versions past end-of-lifetime, incompatible forks, and other TLS libraries are dropped.


Since hashstate only pulls a very small subset of OpenSSL for computing hashes, I assume (this is a big assumption), that it's safe to link to an older EOL version of OpenSSL which has a better-understood API. I see no reason that hashstate has to use the the same version of OpenSSL as the Python that's running it, since we can just statically link all of the necessary OpenSSL code into the built hashstate library.

Currently, hashstate appears to expect some version of OpenSSL < 1.0.2g, as building on Ubuntu 16.04 with libssl-dev 1.0.2g-1ubuntu4.19 results in errors that EVP_MD_CTX_cleanup is missing from the OpenSSL API. According to OpenSSL's changelog EVP_MD_CTX_cleanup was removed sometime "between 1.0.2h and 1.1.0 [25 Aug 2016]". I would assume that "g" comes before "h", but this seems par for the course with OpenSSL.

If we try building against an even older OpenSSL, I expect we can find one that supports the current code base. To do this, I think the simplest approach might be to vendor OpenSSL as a submodule. Of course, now its our responsibly to build an old version of OpenSSL from scratch, which could end up being too difficult.


An alternative would be to switch hashstate to use OpenSSL 1.1.1. This is the version used by the GitHub Actions ubuntu-latest (Ubuntu 18.04, which uses libssl-dev 1.1.1-1ubuntu2.1~18.04.8) and macos-latest (see https://github.com/actions/virtual-environments/issues/2089 ; I assume they mean "1.1.1" exactly, not something like "1.1.1g"), so we could build against the system-provided versions (and then statically link them into the built hashstate).

zachmullen commented 3 years ago

I'm fine with either option. Or even completely other ones -- if there's some other, less ridiculous library we want to use for hashing that's fine too. I chose this way solely as a result of copying what cpython was doing.