indygreg / python-build-standalone

Produce redistributable builds of Python
BSD 3-Clause "New" or "Revised" License
1.75k stars 109 forks source link

PyQt segfaults on modern Linux distros when run with standalone python #95

Closed dae closed 8 months ago

dae commented 2 years ago

This is a curious one. Something about the way these packages are built is causing a segfault when PyQt tries to initialize a connection to the X server. The following can be seen on distros like Ubuntu 21.04, Debian 11 or Fedora 35:

% tar -axf cpython-3.9.6-x86_64-unknown-linux-gnu-pgo-20210724T1424.tar.zst
% ./python/install/bin/python3 -m venv{,-test} && . ./venv-test/bin/activate && pip install pyqt5 && DISPLAY=:0 python -c 'import sys; from PyQt5.QtGui import QGuiApplication as Q; Q(sys.argv)'; deactivate
Collecting pyqt5
  Using cached PyQt5-5.15.4-cp36.cp37.cp38.cp39-abi3-manylinux2014_x86_64.whl (8.3 MB)
Collecting PyQt5-Qt5>=5.15
  Using cached PyQt5_Qt5-5.15.2-py3-none-manylinux2014_x86_64.whl (59.9 MB)
Collecting PyQt5-sip<13,>=12.8
  Using cached PyQt5_sip-12.9.0-cp39-cp39-manylinux1_x86_64.whl (328 kB)
Installing collected packages: PyQt5-Qt5, PyQt5-sip, pyqt5
Successfully installed PyQt5-Qt5-5.15.2 PyQt5-sip-12.9.0 pyqt5-5.15.4
WARNING: You are using pip version 20.2.3; however, version 21.3 is available.
You should consider upgrading via the '/home/dae/venv-test/bin/python3 -m pip install --upgrade pip' command.
Segmentation fault (core dumped)

If we run it with the system-installed Python 3.9 or one built from standard Python sources, it works fine:

% python3.9 -m venv{,-test} && . ./venv-test/bin/activate && pip install pyqt5 && DISPLAY=:0 python -c 'import sys; from PyQt5.QtGui import QGuiApplication as Q; Q(sys.argv)'; deactivate
Collecting pyqt5
  Using cached PyQt5-5.15.4-cp36.cp37.cp38.cp39-abi3-manylinux2014_x86_64.whl (8.3 MB)
Collecting PyQt5-Qt5>=5.15
  Using cached PyQt5_Qt5-5.15.2-py3-none-manylinux2014_x86_64.whl (59.9 MB)
Collecting PyQt5-sip<13,>=12.8
  Using cached PyQt5_sip-12.9.0-cp39-cp39-manylinux1_x86_64.whl (328 kB)
Installing collected packages: PyQt5-sip, PyQt5-Qt5, pyqt5
Successfully installed PyQt5-Qt5-5.15.2 PyQt5-sip-12.9.0 pyqt5-5.15.4
%

The crash is happening in X11's libraries:

(gdb) bt
#0  0x00007ffff2e51ba0 in ?? () from /lib/x86_64-linux-gnu/libX11.so.6
#1  0x0000000000db796e in XCreateGC ()
#2  0x0000000000dc80ec in XOpenDisplay ()
#3  0x00007ffff2440ed5 in QXcbBasicConnection::QXcbBasicConnection(char const*) ()

I wonder if the glibc version used to build these standalone builds might be interacting in a bad way with the Qt/PyQt libraries?

(For background, I'm trying to use PyOxidizer to bundle up a PyQt application. Things appear to be working well on Windows and Mac. Thank you very much for improving the Python packaging situation!)

dae commented 2 years ago

It appears to be tkinter support that's the culprit - if I comment the tkinter line out of the enabled modules, the segfault goes away. PyQt will be dynamically linking with the system-installed X11 libraries, so I presume having an older version statically linked in is causing some sort of conflict. Would it be practical to provide an x11-free build in the future?

I initially tried building with the docker image set to debian 10 (which was easy apart from needing to add python3-distutils to prevent the clang build from breaking), but that did not help unfortunately. I was a bit daunted by having to compile all those dependencies, but the build scripts worked great - nicely done!

indygreg commented 2 years ago

Interesting bug report!

The _tkinter extension module on Linux needs to link against lib{tcl8.6, tk8.6, X11, xcb, Xau}. We statically link these libraries into _tkinter. And since all extension modules are statically linked into the Python executable, _tkinter and all these other libraries are statically linked into python.

When you load a PyQt extension module and call into some compiled C code, we get a crash. From the stack trace, it looks like multiple versions of libX11 are getting loaded (the statically linked one and your /lib/x86_64-linux-gnu/libX11.so.6). This is almost certainly causing the crash.

Welcome to Linux dependency hell.

The general workaround to this is to not statically link libX11 into our python.

If we keep _tkinter a builtin and don't link libX11, we'll incur a runtime dependency on libX11.so, which would make the distributions less portable. So that's off the table.

We could potentially drop _tkinter from the builtin extension module list, making it a standalone shared library file (like Python extension modules typically are). If it statically linked against libX11, we wouldn't be able to import it into the same process using PyQt, however. Making it a standalone extension module also has implications for PyOxidizer and producing single file binaries.

We could also provide 2 extension module variants for _tkinter: 1 with static libX11 linking and another with dynamic linking. That way downstream consumers could have it both ways.

For the portable installation, the installation is already multiple files. So assuming we provide a fully statically linked _tkinter in the build artifacts, we could potentially make _tkinter a standalone extension module with a libX11.so dependency so the Python distribution were maximally compatible with PyQt.

I'll have to think about this.

dae commented 2 years ago

Making it a standalone extension module also has implications for PyOxidizer and producing single file binaries.

Perhaps this is a naive question (I have never used tk), but doesn't https://pyoxidizer.readthedocs.io/en/v0.10.3/packaging_tkinter.html#tcl-files-prevent-self-contained-executables mean that it's already not possible to distribute a single file binary that uses tkinter?

We could also provide 2 extension module variants for _tkinter: 1 with static libX11 linking and another with dynamic linking.

That sounds great, but if you didn't want the added complexity in the build scripts, it sounds like either of those options would work around the problem for us. Presumably most users invoking an X11 app on a system with X11 installed will already have the X11 libraries, so I guess the main benefit of static linking would be for when an app is installed on a headless server that lacks X11, and then run with DISPLAY set to a remote machine?

For the portable installation, the installation is already multiple files. So assuming we provide a fully statically linked _tkinter in the build artifacts, we could potentially make _tkinter a standalone extension module with a libX11.so dependency so the Python distribution were maximally compatible with PyQt.

On a semi-related note, thank you for #79 too - I learnt about this project from PyOxidizer, but also happen to be using Bazel, and the portable install opens up the possibility of replacing our current reliance on a system-installed Python with a pinned version that gets automatically fetched as part of the build.

silverjam commented 2 years ago

Interesting bug report!

The _tkinter extension module on Linux needs to link against lib{tcl8.6, tk8.6, X11, xcb, Xau}. We statically link these libraries into _tkinter. And since all extension modules are statically linked into the Python executable, _tkinter and all these other libraries are statically linked into python.

When you load a PyQt extension module and call into some compiled C code, we get a crash. From the stack trace, it looks like multiple versions of libX11 are getting loaded (the statically linked one and your /lib/x86_64-linux-gnu/libX11.so.6). This is almost certainly causing the crash.

Welcome to Linux dependency hell.

The general workaround to this is to not statically link libX11 into our python.

If we keep _tkinter a builtin and don't link libX11, we'll incur a runtime dependency on libX11.so, which would make the distributions less portable. So that's off the table.

We could potentially drop _tkinter from the builtin extension module list, making it a standalone shared library file (like Python extension modules typically are). If it statically linked against libX11, we wouldn't be able to import it into the same process using PyQt, however. Making it a standalone extension module also has implications for PyOxidizer and producing single file binaries.

Hi @indygreg -- could you provide a few pointers on how we might go about doing this? We don't need _tkinter for our use of this project, so building as a standalone shared module sounds like the way to go for us.

We could also provide 2 extension module variants for _tkinter: 1 with static libX11 linking and another with dynamic linking. That way downstream consumers could have it both ways.

For the portable installation, the installation is already multiple files. So assuming we provide a fully statically linked _tkinter in the build artifacts, we could potentially make _tkinter a standalone extension module with a libX11.so dependency so the Python distribution were maximally compatible with PyQt.

Would the solution to this be another build variant? I would think that most apps that use PyQt don't need Tcl/Tk support, so dropping one in favor would be fine for PyQt apps.

indygreg commented 2 years ago

For pre-built, ready-to-run distributions, we would need to change the default or make a separate build variant without libX11 statically linked.

Since I think people do want a working _tkinter by default (people complained when it didn't work in the early days of this project and of PyOxidizer), that means we need a separate build variant / distribution - one without _tkinter completely or one that isn't statically linked into python.

The easiest way to do this is to probably add a config key to the targets.yml file defining extensions to ignore or leave as dynamically linked. This would require a refactor to this file so build configurations are defined by name and not by their target triple (right now we assume 1 build configuration per triple and that assumption would be invalidated). That refactor is probably the bulk of the work to implement this.

silverjam commented 2 years ago

For pre-built, ready-to-run distributions, we would need to change the default or make a separate build variant without libX11 statically linked.

Since I think people do want a working _tkinter by default (people complained when it didn't work in the early days of this project and of PyOxidizer), that means we need a separate build variant / distribution - one without _tkinter completely or one that isn't statically linked into python.

The easiest way to do this is to probably add a config key to the targets.yml file defining extensions to ignore or leave as dynamically linked. This would require a refactor to this file so build configurations are defined by name and not by their target triple (right now we assume 1 build configuration per triple and that assumption would be invalidated). That refactor is probably the bulk of the work to implement this.

We'd be happy to take a stab at this, do you have any tips on how to get the project building within the time limits of GitHub Actions? Previously I was able to build this project on our fork, but recently the toolchain stage is timing out. My current plan is to hook this up to an internal S3 bucket, but other options would be welcome.

On another note, do you accept sponsorship/bounties/contracts for specific bugs/features like this one?

silverjam commented 2 years ago

We'd be happy to take a stab at this, do you have any tips on how to get the project building within the time limits of GitHub Actions? Previously I was able to build this project on our fork, but recently the toolchain stage is timing out. My current plan is to hook this up to an internal S3 bucket, but other options would be welcome.

FYI, I was able to get things building by hooking up an internal S3 bucket to populate sccache.

indygreg commented 2 years ago

Regarding the time limits, there's no easy way to work around that today without running your own S3 bucket for sccache. I've been chipping away at moving the GCC/Clang build to an external project. My goal is to then have that project publish the Clang toolchain archive as a GitHub release artifact and for this project to consume the prebuilt Clang toolchain. That would eliminate the long pole in this project's build and hopefully allow forks to have working CI.

silverjam commented 2 years ago

Regarding the time limits, there's no easy way to work around that today without running your own S3 bucket for sccache. I've been chipping away at moving the GCC/Clang build to an external project. My goal is to then have that project publish the Clang toolchain archive as a GitHub release artifact and for this project to consume the prebuilt Clang toolchain. That would eliminate the long pole in this project's build and hopefully allow forks to have working CI.

Wonderful, looking forward to this.

indygreg commented 2 years ago

The latest release for this project no longer exports symbols from 3rd party libraries. This might fix the issue you reported. But I haven't verified yet.

I have also switched the Linux and macOS builds to use pre-built Clang toolchains. So CI should no longer time out.

dae commented 2 years ago

That sounds great, will try test this out in the following weeks and confirm if the issue has been solved.

dae commented 1 year ago

Sorry it took so long to get back to this. The issue seems to be fixed - I'm no longer seeing crashes when importing PyQt on a Debian 11 machine when using the latest python-build-standalone release.