pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.26k stars 625 forks source link

Generating lockfiles fails with: unknown error (_ssl.c:3161) #20467

Closed mjimlittle closed 7 months ago

mjimlittle commented 7 months ago

Describe the bug When trying to generate lockfiles command fails with the following error: Failed to spawn a job for /home/manos/Workspace/pants-repo/.conda/bin/python3.9: unknown error (_ssl.c:3161)

pants --print-stacktrace -ldebug generate-lockfiles ::
18:15:17.57 [INFO] Initialization options changed: reinitializing scheduler...
18:15:22.39 [INFO] Scheduler initialized.
18:15:23.84 [INFO] Completed: Generate lockfile for python-default
18:15:23.84 [ERROR] 1 Exception encountered:

Engine traceback:
  in select
    ..
  in pants.core.goals.generate_lockfiles.generate_lockfiles_goal
    `generate-lockfiles` goal

Traceback (most recent call last):
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/core/goals/generate_lockfiles.py", line 557, in generate_lockfiles_goal
    results = await MultiGet(
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 361, in MultiGet
    return await _MultiGet(tuple(__arg0))
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 168, in __await__
    result = yield self.gets
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/backend/python/goals/lockfile.py", line 110, in generate_lockfile
    result = await Get(
  File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 118, in __await__
    result = yield self
pants.engine.process.ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /home/manos/Workspace/pants-repo/.conda/bin/python3.9: unknown error (_ssl.c:3161)

Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.

Pants version Tested with versions:

(same result for all tested versions)

OS Tested with

(same result for all tested versions)

Additional info I think this issue started happening after a kernel update from Fedora. Has anyone else run into this issue before? Any suggestions on how to resolve this would be very appreciated!

jsirois commented 7 months ago

The source is free to read. My reading says the underlying SSL lib CPython is linking against is not supported (you're probably on the right track): https://github.com/python/cpython/blob/8fc8c45b6717be58ad927def1bf3ea05c83cab8c/Modules/_ssl.c#L3161

I'd ldd /home/manos/Workspace/pants-repo/.conda/bin/python3.9 to see the linkage and work from there. This is a much lower level issue than Pants and it would be good to cut Pants out of the debugging.

xlevus commented 7 months ago

I'm experiencing the same issues, Also using Fedora. I'm only ever able to replicate the issue when using what's in the sandbox.

I can't:

Using distrobox to try and run in older versions of fedora seems to have the same issue, but it doesn't seem to be entirely isolating things from the rest of the system.

jsirois commented 7 months ago

The sandbox blanks out env vars and that can be important. @xlevus can you ldd and investigate your env vs sandbox env to help isolate if this is an LD_LIBRARY_PATH or other env var required but blocked by Pants issue? The whole scie-pants thing is almost certainly way off track. If Pants launches at all, scie-pants is long out of the picture entirely.

xlevus commented 7 months ago
📦[xlevus@pants-debug2 gymkhana]$ pants --keep-sandboxes=on_failure generate-lockfiles ::
10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default
10:54:06.16 [INFO] Completed: Generate lockfile for python-default
10:54:06.16 [ERROR] 1 Exception encountered:

Engine traceback:
  in `generate-lockfiles` goal

ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)

📦[xlevus@pants-debug2 gymkhana]$ ldd /usr/bin/python3.10
    linux-vdso.so.1 (0x00007ffeaefee000)
    libpython3.10.so.1.0 => /lib64/libpython3.10.so.1.0 (0x00007fc5a9b64000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fc5a9987000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fc5a98a7000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fc5a9ebe000)

📦[xlevus@pants-debug2 gymkhana]$ python3.10 
Python 3.10.13 (main, Aug 28 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _ssl
>>> _ssl.__file__
'/usr/lib64/python3.10/lib-dynload/_ssl.cpython-310-x86_64-linux-gnu.so'
>>> _ssl.OPENSSL_VERSION
'OpenSSL 3.0.9 30 May 2023'

📦[xlevus@pants-debug2 gymkhana]$ ldd /usr/lib64/python3.10/lib-dynload/_ssl.cpython-310-x86_64-linux-gnu.so
    linux-vdso.so.1 (0x00007ffec9df8000)
    libssl.so.3 => /lib64/libssl.so.3 (0x00007ffbd04b3000)
    libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007ffbd0088000)
    libc.so.6 => /lib64/libc.so.6 (0x00007ffbcfeab000)
    libz.so.1 => /lib64/libz.so.1 (0x00007ffbcfe91000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ffbd0592000)
📦[xlevus@pants-debug2 gymkhana]$ 
📦[xlevus@pants-debug2 pants-sandbox-z3rdnp]$ ldd /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9
    linux-vdso.so.1 (0x00007ffc1a626000)
    /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/../lib/libpython3.9.so.1.0 => not found
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f91208e7000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f91208e2000)
    libutil.so.1 => /lib64/libutil.so.1 (0x00007f91208dd000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f91207fd000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f91207f6000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f9120619000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f91208f8000)

📦[xlevus@pants-debug2 pants-sandbox-z3rdnp]$ /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9
Python 3.9.18 (main, Jan  8 2024, 05:40:12) 
[Clang 17.0.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
Cannot read termcap database;
using dumb terminal settings.
>>> import _ssl
>>> _ssl.__file__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module '_ssl' has no attribute '__file__'
>>> _ssl.OPENSSL_VERSION
'OpenSSL 3.0.12 24 Oct 2023'

the original __run.sh contents:

env -i CPPFLAGS= LANG=en_NZ.UTF-8 LDFLAGS= PATH=$'/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin' PEX_IGNORE_RCFILES=true PEX_PYTHON=/home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 PEX_ROOT=.cache/pex_root PEX_SCRIPT=pex3 /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 ./pex lock create --tmpdir .tmp --python-path $'/home/xlevus/.pyenv/versions/3.10.13/bin:/home/xlevus/.pyenv/versions/3.12.1/bin:/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin' $'--output=lock.json' --no-emit-warnings $'--style=universal' --pip-version 23.1.2 --resolver-version pip-2020-resolver --target-system linux --target-system mac $'--indent=2' --no-pypi $'--index=https://pypi.org/simple/' --manylinux manylinux2014 --interpreter-constraint $'CPython==3.10.*' django

when changing PEX_PYTHON to PEX_PYTHON=/usr/bin/python3.10 or the system installed python3.9 __run.sh runs OK and generates a lockfile.

xlevus commented 7 months ago

Further:

Unpacking pex and changing __run.sh to invoke __main__.py instead, I can trace the error to : https://github.com/pantsbuild/pex/blob/v2.1.137/pex/fetcher.py#L48

(Pdb) sys.executable '/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/bin/python3.9

  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(937)_bootstrap()
-> self._bootstrap_inner()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(980)_bootstrap_inner()
-> self.run()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(917)run()
-> self._target(*self._args, **self._kwargs)
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/jobs.py(525)spawn_jobs()
-> result = Spawn(item, spawn_func(item))
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolver.py(130)_spawn_download()
-> self.observer.observe_download(target=target, download_dir=download_dir)
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolve/lockfile/create.py(201)observe_download()
-> url_fetcher=URLFetcher(
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/ssl.py(738)create_default_context()
-> context = SSLContext(PROTOCOL_TLS)
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/ssl.py(484)__new__()
-> self = _SSLContext.__new__(cls, protocol)

The protocol version being passed in is: <_SSLMethod.PROTOCOL_TLS: 2>

Buuuuut, changing the __run.sh script to (i.e. call that function, using the same environment & interpreter) it works fine ???:

env -i CPPFLAGS= LANG=en_NZ.UTF-8 LDFLAGS= PATH=$'/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.local/bin:/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin' PEX_IGNORE_RCFILES=true PEX_PYTHON=/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 PEX_ROOT=.cache/pex_root PEX_SCRIPT=pex3 /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 -c "import ssl; ssl.create_default_context()"

jsirois commented 7 months ago

@xlevus it looks like you have everything at hand you need to dig. If you can come up with a docker-based repro, perhaps someone can help out, but as it stands you have the files, paths, etc.

xlevus commented 7 months ago

I've created a docker-based reproduction here: https://github.com/xlevus/pants-issue-20467 and it seems to work in a Fedora VM, trying to get an ubuntu VM up to confirm it works in one of those too.

Will poke around more tonight. But i'm a little stumped tbqh.

It appears to only be an issue when specifically combining the distributed python3.9 venv for pants, and pex. The venv's ssl-context code works fine, and the pex code works fine. but combine the two and ???

jsirois commented 7 months ago

Great - thanks. I'll try to poke around with the repro case. That said, this is crazy-making "It appears to only be an issue when specifically combining the distributed python3.9" since your OP is this:

$ pants --keep-sandboxes=on_failure generate-lockfiles ::
10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default
10:54:06.16 [INFO] Completed: Generate lockfile for python-default
10:54:06.16 [ERROR] 1 Exception encountered:

Engine traceback:
  in `generate-lockfiles` goal

ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)

That is definitely not python3.9 let alone the scie-pants hermetic python3.9.

xlevus commented 7 months ago

Great - thanks. I'll try to poke around with the repro case. That said, this is crazy-making "It appears to only be an issue when specifically combining the distributed python3.9" since your OP is this:

$ pants --keep-sandboxes=on_failure generate-lockfiles ::
10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default
10:54:06.16 [INFO] Completed: Generate lockfile for python-default
10:54:06.16 [ERROR] 1 Exception encountered:

Engine traceback:
  in `generate-lockfiles` goal

ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)

That is definitely not python3.9 let alone the scie-pants hermetic python3.9.

The error message is misleading. The 'failed to spawn a job' is from Pex's Job runner.

I'm 100% 89% confident the error comes from within a python3.9 executable.

Here's my hacked up sandbox with a pdb.breakpoint stuck right before the failing ssl call:

📦[xlevus@pants-debug2 pants-sandbox-WRfUMY]$ ./__run.sh 
Cannot read termcap database;
using dumb terminal settings.
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(Pdb) w
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(937)_bootstrap()
-> self._bootstrap_inner()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(980)_bootstrap_inner()
-> self.run()
  /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(917)run()
-> self._target(*self._args, **self._kwargs)
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/jobs.py(525)spawn_jobs()
-> result = Spawn(item, spawn_func(item))
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolver.py(130)_spawn_download()
-> self.observer.observe_download(target=target, download_dir=download_dir)
  /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolve/lockfile/create.py(201)observe_download()
-> url_fetcher=URLFetcher(
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(Pdb) !import sys
(Pdb) pp sys.executable
'/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/bin/python3.9'
(Pdb) n
ssl.SSLError: unknown error (_ssl.c:3161)
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(pdb)
jsirois commented 7 months ago

Ok, thanks for the repro case @xlevus - super helpful.

I have not figured out why PBS Python 3.9 is different here, and apparently only different in a Fedora context to boot, but the issue is related to threading. If you use a PBS 3.9 repl to import ssl; ssl.create_default_context() - no issue as you found out. The relevant difference in the Pex case is this function is called not in the main application thread, but in a job spawn thread used for spawning parallel (subprocess) jobs. If I create an SSL context early in the main thread, all is well and the lock succeeds:

[root@3d2dd3ceaa5c pants-sandbox-qAIWax]# diff -u .deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py pex-venv/lib/python3.9/site-packages/pex/fetcher.py
--- .deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py       1980-01-01 00:00:00.000000000 +0000
+++ pex-venv/lib/python3.9/site-packages/pex/fetcher.py 2024-01-28 21:18:01.662789434 +0000
@@ -4,6 +4,8 @@
 from __future__ import absolute_import

 import ssl
+ssl.create_default_context()
+
 import time
 from contextlib import closing, contextmanager

[root@3d2dd3ceaa5c pants-sandbox-qAIWax]#

There the diff represents some sandbox mucking about, but the upshot is trying to grab the context on import of pex/fetcher.py is enough to ensure this happens in the main thread and all is well.

The remaining work to do is to see what is buggy here. Is this a PBS Python build buggy somehow? Is it a bug in Pex code - should SSLContext only ever be created in the application main thread? Is this a Fedora glibc modern (which includes libpthread) vs libpthread.so.0 which PBS links to (unlike the system Python 3.9)? I have no clue at the moment.

jsirois commented 7 months ago

I'll note that I'm dropping work for the evening and I'm AFK likely until the 1st.

xlevus commented 7 months ago

Further Investigation:

Possible key change between the two is:

OpenSSL 1.1 -> 3.0 on supported platforms. Linux and macOS now use OpenSSL 3.0.x. Windows uses OpenSSL 3.0.x on CPython 3.11+.

jsirois commented 7 months ago

@xlevus I'm working on a short-term fix in https://github.com/pantsbuild/pex/issues/2355. I'd still love to know what's really going on here, but 1st to stop the bleeding.

jsirois commented 7 months ago

I've flipped this back to a bug - apologies @mjimlittle, you ended up being right there. With @xlevus's help debugging, a fix for this issue in Pex is now released in 2.1.163: https://github.com/pantsbuild/pex/releases/tag/v2.1.163

A Pants maintainer will take it from here and upgrade Pants / instruct you how to do so for your Pants version.

mjimlittle commented 7 months ago

Hey @jsirois thanks for the update. Also, I am sorry I could not help out in tracing the source of the issue. I'm relatively new to the python/pants ecosystem so I could not keep up with @xlevus :D

As a workaround, I have Dockerized pants using an Ubuntu base image and I can successfully generate needed lock files.

cburroughs commented 7 months ago

To use the new version of Pex without waiting on a Pants release

[pex-cli]
version = "v2.1.163"
known_versions = [
  "v2.1.163|macos_arm64 |21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
  "v2.1.163|macos_x86_64|21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
  "v2.1.163|linux_x86_64|21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
  "v2.1.163|linux_arm64 |21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
]

(That's the sha256 and size of the pex artifact, which you can calculate your self by downloading from the release page.)

mjimlittle commented 7 months ago

Thanks @cburroughs works fine now!

jsirois commented 7 months ago

@xlevus it turns out the issue is the custom RedHat OpenSSL option "rh-allow-sha1-signatures", seen here for example: https://gitlab.com/redhat/centos-stream/rpms/openssl/-/blob/c9s/0049-Selectively-disallow-SHA1-signatures.patch

If I do this on a fedora:37 image:

[root@d13f087cea45 /]# diff -u /etc/crypto-policies/back-ends/opensslcnf.config.orig /etc/crypto-policies/back-ends/opensslcnf.config
--- /etc/crypto-policies/back-ends/opensslcnf.config.orig       2024-02-09 00:54:33.569271689 +0000
+++ /etc/crypto-policies/back-ends/opensslcnf.config    2024-02-09 00:54:54.309267497 +0000
@@ -6,8 +6,3 @@
 DTLS.MaxProtocol = DTLSv1.2
 SignatureAlgorithms = ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:ed25519:ed448:rsa_pss_pss_sha256:rsa_pss_pss_sha384:rsa_pss_pss_sha512:rsa_pss_rsae_sha256:rsa_pss_rsae_sha384:rsa_pss_rsae_sha512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224

-[openssl_init]
-alg_section = evp_properties
-
-[evp_properties]
-rh-allow-sha1-signatures = yes

Then a test rig works without main thread vs non shenanigans. As to why the thread makes a difference I have no clue yet, but a custom PBS build that enables openssl debug symbols and many gdb sessions later, I was able to narrow in on reading rh-allow-sha1-signatures, which is not a standard openssl config option, as the action leading to an error return path eventually bubbling out to _ssl.c:3161.

I'll update https://github.com/indygreg/python-build-standalone/issues/207 with all the details of the debug session later tonight. This is not Gregory's problem, but others may bump into RedHat shenanigans and need the ~FAQ on what goes on when vanilla openssl in PBS tries to read RedHat custom config.

jsirois commented 7 months ago

The explanation is contained in a comment in https://github.com/pantsbuild/pex/pull/2358 which I've pinged folks in this thread on.