indygreg / python-build-standalone

Produce redistributable builds of Python
Mozilla Public License 2.0
1.97k stars 125 forks source link

Issues calling `ssl.create_default_context()` from within threads while using Fedora. #207

Open xlevus opened 8 months ago

xlevus commented 8 months ago

Calling ssl.create_default_context() in version 20240107 and 20231002 from within a threading.Thread() will error with pythonX.Y: unknown error (_ssl.c:ZZZZ)

Observances:

The issue was found from the Pants project (which runs Pex within a PBS runtime) https://github.com/pantsbuild/pants/issues/20467

I've created a repository with a Dockerfile and a script that can reproduce the issue here: https://github.com/xlevus/pants-issue-20467/tree/main (see simple/repro.sh)

I have no idea where the issue actually sits. The OpenSSL documentation on SSL_CTX is somewhat ambiguous too:

An SSL_CTX object should not be changed after it is used to create any SSL objects or from multiple threads concurrently, since the implementation does not provide serialization of access for these cases.

jsirois commented 7 months ago

I'll add more details later, but, in short, the issue seems to be a RedHat issue and it does not appear to be any problem with PBS. Fedora has a custom OpenSSL config option "rh-allow-sha1-signatures" as seen here: https://gitlab.com/redhat/centos-stream/rpms/openssl/-/blob/c9s/0049-Selectively-disallow-SHA1-signatures.patch And use of it: https://gitlab.com/redhat-crypto/fedora-crypto-policies/-/merge_requests/110/diffs When a vanilla OpenSSL like PBS uses tries to read that config option - it dies, leading to this error. Comment out the custom config and the error goes away. As to why the unknown config key leads to an error only on a non-main-thread path, I don't have a clue yet. The summary, though, is the hermetic PBS OpenSSL is subject to openssl.cnf et. al. from the ambient environment and issues with that configuration can, of course, affect PBS.

jsirois commented 7 months ago

Ok, I think the comment in https://github.com/pantsbuild/pex/pull/2358 here explains what is going on: https://github.com/pantsbuild/pex/pull/2358/files#diff-a8ab8ae64ae5d677d8f616a6934108c4436479a9d560002b6f49402e8f862d72R71

@xlevus and @indygreg as far as I can tell, there is really nothing PBS can do about machines that use custom OpenSSL config requiring a patched OpenSSL to read and this issue can be closed.

The linked explanation above, snipped:

These shenanigans deserve some explanation, since, in OpenSSL 3.0 anyhow, it is perfectly fine to create an SSL Context (SSL_CTX_new) in any thread:

It turns out that, in typical use, the CPython ssl module hides OpenSSL configuration issues through no real fault of its own. This is due to the fact that an import of the ssl module, which generally happens in the main thread, triggers, through instantiation of the ssl.Purpose enum type ^1 a call to OpenSSL's OBJ_obj2nid ^2 which loads OpenSSL config but throws away the return value; thus hiding errors in config. Since the OpenSSL config scheme is to load it at most once per thread, this means subsequent OpenSSL call paths in the same thread that imported ssl (like the one generated via ssl.create_default_context) that do check the return value of config loading ^4, will not have to load config (since it's been done once already in the thread) and thus will not get the chance to check the config load function return value and will thus not error. This default behavior is almost certainly bad, since it allows invalid OpenSSL configs to go partially read at best. That said, this is the default Python single-threaded behavior and, right or wrong, we preserve that here by forcing our SSL initialization to happen in the main thread, keeping any OpenSSL misconfiguration silently ignored.

The only solace here is that the use cases where OpenSSL config can be bad on a machine and the machine still function are narrow. The case we know of, that triggered creation of this machinery, involves the combination of a modern PBS Python ^6 (which has a vanilla OpenSSL statically linked into the Python binary) running on a RedHat OS that expresses custom RedHat configuration keys ^7 in its OpenSSL config. These custom keys are only supported by RedHat patches to OpenSSL and cause vanilla versions of OpenSSL to error when loading config due to unknown configuration options.

indygreg commented 7 months ago

Great investigation!

Although I'm still a little confused how RedHat's OpenSSL config settings are influencing our statically linked OpenSSL. Is there some kind of file-based config that PBS is picking up from /etc (or similar) that is throwing off code in OpenSSL / CPython?

Do you think an issue should be filed against CPython so they are at least aware of the potential for this bug? Perhaps they could do something upstream like employ guards around OpenSSL initialization to try to prevent this?

jsirois commented 7 months ago

Although I'm still a little confused how RedHat's OpenSSL config settings are influencing our statically linked OpenSSL.

As I read it, OpenSSL, ~no matter how built, always looks for its runtime config in well known locations (/etc/ssl/openssl.cnf).

Do you think an issue should be filed against CPython so they are at least aware of the potential for this bug? Perhaps they could do something upstream like employ guards around OpenSSL initialization to try to prevent this?

Yeah. I'll do this during my next work stint here in a few days. They already do guard in some code paths and do not guard in others. I'm afraid if they guard in the main passive import path (they do not today), they'll run into a world of backwards compatibility hurt, but that's their problem to solve. I should at least make sure they're aware of the problem in the first place.