pygobject / pycairo

Python bindings for cairo
https://pycairo.readthedocs.io
Other
622 stars 85 forks source link

Windows: fix DLL not found error in Python 3.8+ #305

Closed danyeaw closed 4 months ago

danyeaw commented 1 year ago

This was also attempted in #248 and #277.

In Python 3.8, Python changed the behavior to no longer automatically import DLLs from the Path. Unfortunately, Pycairo is effectively broken in Windows when building from source.

As a user, I should be able to follow any Pycairo example in Windows, and it should just work.

Right now, that is not possible for users to run import cairo without calling os.add_dll_directory first.

It also isn't possible to package an app using PyInstaller in Windows with Pycairo built from source because the hooks perform their analysis in an isolated subprocess, and there's no way to propagate the DLL search paths. Reference issue: https://github.com/pyinstaller/pyinstaller/issues/7333

Using the wheels built for Pycairo isn't possible because they are statically linked, and I need a dynamic linking to use GTK and Cairo together. This was a clever idea though for a casual user just trying to make a quick app using pycairo standalone.

In #277, I thought this change wasn't needed, but it was only because I was using a patched version of PyGObject that was already adding the DLL directory.

lazka commented 1 year ago

I'm fully aware that this is frustrating and packag devs and users have to deal with this somehow without any proper guidance from CPython upstream, so the following is a bit of a rant, not directed at you or this PR...

In Python 3.8, Python changed the behavior to no longer automatically import DLLs from the Path. Unfortunately, Pycairo is effectively broken in Windows when building from source.

Like every other package linking against external software

As a user, I should be able to follow any Pycairo example in Windows, and it should just work.

Right now, that is not possible for users to run import cairo without calling os.add_dll_directory first.

As it is intended and recommended by CPython developers, for security reasons.

It also isn't possible to package an app using PyInstaller in Windows with Pycairo built from source because the hooks perform their analysis in an isolated subprocess, and there's no way to propagate the DLL search paths. Reference issue: pyinstaller/pyinstaller#7333

Why doesn't it allow setting a dll path in that process? How does it handler other Python packages linking against external libraries?

What if we check a PYCAIRO_DLL_DIRECTORY env var instead? to make sure this is only active during build time, and not when run on some random Windows machine?

Using the wheels built for Pycairo isn't possible because they are statically linked, and I need a dynamic linking to use GTK and Cairo together. This was a clever idea though for a casual user just trying to make a quick app using pycairo standalone.

Yeah, that was one of the reasons why I didn't want to provide wheels initially, but it helped more users that it hurt in the end.

In #277, I thought this change wasn't needed, but it was only because I was using a patched version of PyGObject that was already adding the DLL directory.

danyeaw commented 1 year ago

I'm fully aware that this is frustrating and packag devs and users have to deal with this somehow without any proper guidance from CPython upstream, so the following is a bit of a rant, not directed at you or this PR...

No offense taken 😉. I want pycairo to be awesome as well, so I'm here to help us find a good solution.

Like every other package linking against external software

We don't expect Linux and macOS users to explicitly say where their .so files are located before they run pycairo. I feel like Windows users should have a similar experience.

As it is intended and recommended by CPython developers, for security reasons.

Correct, although I think the threat is pretty minimal if we are only adding very specific DLLs and not all DLLs found on the Path.

They were definitely also protecting for the use case that libraries bundle their own libs instead of only relying on end users to explicitly declare a search path. In the Enable better DLL resolution issue, the solution they picked which zooba called number 1 was specifically to allow for libraries to package their own libs. This is a little different from that I agree though, we aren't talking about packaging our own libs but linking to them.

Why doesn't it allow setting a dll path in that process? How does it handler other Python packages linking against external libraries?

It does allow setting the DLL path if a dependency library uses add_dll_directory, for example PyInstaller tracks calls to add_dll_directory during package initialization.

What if we check a PYCAIRO_DLL_DIRECTORY env var instead? to make sure this is only active during build time, and not when run on some random Windows machine?

This might be a good balance to do something like this. Wouldn't we need it set at runtime to find the DLL though? I can build pycairo fine, it is running pycairo where I get errors.

lazka commented 1 year ago

Like every other package linking against external software

We don't expect Linux and macOS users to explicitly say where their .so files are located before they run pycairo. I feel like Windows users should have a similar experience.

I should have said "Like every other package linking against external software, on Windows"

As it is intended and recommended by CPython developers, for security reasons.

Correct, although I think the threat is pretty minimal if we are only adding very specific DLLs and not all DLLs found on the Path.

Our extension only looks for cairo.dll, or libcairo-2.dll on the PATH with older CPython, just like your code. So there is no difference from what I see.

They were definitely also protecting for the use case that libraries bundle their own libs instead of only relying on end users to explicitly declare a search path. In the Enable better DLL resolution issue, the solution they picked which zooba called number 1 was specifically to allow for libraries to package their own libs. This is a little different from that I agree though, we aren't talking about packaging our own libs but linking to them.

Why doesn't it allow setting a dll path in that process? How does it handler other Python packages linking against external libraries?

It does allow setting the DLL path if a dependency library uses add_dll_directory, for example PyInstaller tracks calls to add_dll_directory during package initialization.

What if we check a PYCAIRO_DLL_DIRECTORY env var instead? to make sure this is only active during build time, and not when run on some random Windows machine?

This might be a good balance to do something like this. Wouldn't we need it set at runtime to find the DLL though? I can build pycairo fine, it is running pycairo where I get errors.

By build time I meant the "PyInstaller" part. What does PyInstaller do with the found DLLs via add_dll_directory? Copy them somewhere?

danyeaw commented 1 year ago

Our extension only looks for cairo.dll, or libcairo-2.dll on the PATH with older CPython, just like your code. So there is no difference from what I see.

I didn't understand this, are you saying that Pycairo is already trying to automatically load the cairo DLLs in Windows with the latest Python?

By build time I meant the "PyInstaller" part. What does PyInstaller do with the found DLLs via add_dll_directory? Copy them somewhere?

Yes, it packages them with the app.

danyeaw commented 1 year ago

I made some improvements today to the PR:

  1. Reduced the import of libraries to just os try to help ensure that this minimally impacts load up speed of pycairo
  2. Enabled a PY_DLL_DIR environmental variable for users to explicitly specify where to load DLLs
  3. Only search for DLLs in the path if there is an ImportError
  4. Switched from using filter to a generator to make things more pythonic
  5. Removed the extra load code from the test initialization
  6. I offloaded as much as I could in to a separate module, while keeping module level imports

I also did a bit of research to make sure that this approach is consistent with other similar libraries. PySide2 has a similar DLL search routine in its __init__.py file.

danyeaw commented 1 year ago

I did some more research about this today, and another option to solve this may be to bundle the DLL. Other projects like numpy are using delvewheel to mangle the DLL names and copy them in to the wheel itself. With this approach, then there is no need to add a DLL search path.

I ran delvewheel repair .\pycairo-1.24.1-cp311-cp311-win_amd64.whl and it added the following files to a pycairo.libs folder in the wheel:

-----          2023-08-19    21:45        1314816 cairo-2-911d9408ba893bb7dfb6fc000f437c51.dll
-----          2023-08-19    21:45         325632 fontconfig-1-5a3e967bac0b518a42681fe96da1254a.dll
-----          2023-08-19    21:45         764416 freetype-6-4d157d0af160bf79091a3941d344146b.dll
-----          2023-08-18    23:04         205312 libexpat-32a51a377aab8070e54717fafbb53c7a.dll
-----          2023-08-19    21:45         261632 libpng16-8a1769d45baac4bc8e07d0fc48ae8b9d.dll
-----          2023-08-18    23:05         715264 pixman-1-0-f14bd97a45972824ebde5a8f4de5b44b.dll
-----          2023-08-18    22:52          90112 zlib1-ea5571e53626d4c436f912f06a19b918.dll

I noticed in the setup.py, we already have the cairo.dll as part of the package data, but I am not sure if that is still being used still. Also, maybe this approach overlaps with the cairo static build approach that the current wheels are using?

I also saw some projects using the cairo/_distributor_init.py file that @naveen521kk suggested above, and then in the main __init__.py, doing a star import.

Any thoughts on either of these two approaches?

naveen521kk commented 1 year ago

I noticed in the setup.py, we already have the cairo.dll as part of the package data, but I am not sure if that is still being used still.

Yeah previously cairo.dll was included in the wheel, but now it's not being used.

Also, maybe this approach overlaps with the cairo static build approach that the current wheels are using?

Could be. If we were to let the build-system/packager add cairo/_distributor_init.py and leave it empty otherwise then this approach should work.

danyeaw commented 1 year ago

@naveen521kk maybe we could define who this build-system/packager is since the packaging ecosystem in Python is really complex, and I'm not sure I understand. I'm also not in any way blaming Pycairo for this, I think this change in Python 3.8 left lots of libraries on the hook to individually find solutions which isn't great.

  1. Packaging Python libraries and tools - this is where pycairo fits, it is a library aimed at Python developers in order to build Python applications on top of what Cairo provides. Packaging in this category consists of Modules, Python Source Distributions, and Python Binary Distributions, and pycairo publishes a sdist and wheels to PyPI for this use case. For the current binaries provided, they are dynamically linked for Linux and macOS, and statically linked for Windows. Some libraries build on top of other libraries, like mathplotlib's mplcairo makes use of pycairo.

  2. Packaging Python applications - applications that make use of pycairo fit here like Gaphor as an example. There are various ways to package applications including PEX, anaconda, freezers, images, containers, VMs, and even hardware with software loaded. Developers of these applications make use of the library packaging above in order to do development of their applications.

  3. Operating System packages - this is a whole other category for when operating systems have their own software distribution system. Linux distribution packages and MSYS2 fits in to this category. It does have some overlap with the packaging above because some developers set up their development environments using these packages and some users install applications using them as well. However, this is only about a 1/3 of Python developers, and many developers use Python.org downloaded versions, anaconda, pyenv, or other ways to install their development environment (source).

Right now, the workflow to build pycairo (1) is broken on Windows because I have to patch it manually, and I am not able to pip install the package from PyPI. This impacts the ability for me to package an application (2) using a freezer because it doesn't know where to find the DLLs.

So when you say build-system/packager, do you mean operating system packagers (3) should be adding the cairo/_distributor_init.py? If so, that doesn't really fix the use cases for 1 and 2. What are the concerns about including this logic in Pycairo?

naveen521kk commented 1 year ago

Sorry for not being clear. By "build-system/packager" I meant anyone who builds wheels or packages the application, not just OS packagers. Only they would know the exact path where to find the libcairo* DLL file and add the necessary code for it. Sometimes the build-system may know the path to find them and could add the necessary code.

I know that this doesn't actually fix the issue, where, one can build pycairo from the source and expect it to run on Windows. That's the reason we provide wheels on that Platform. For application developers, I think they could build their own wheels and patch it using delvewheel or something similar and use cairo/_distributor_init.py as necessary. Or we could teach pyinstaller (or similar) to work around that as necessary.

I'll elaborate on a use case for cairo/_distributor_init.py. Let's say that we are building a wheel and using delvewheel for patching the wheel, then they would as well add the necessary code to cairo/_distributor_init.py such that it adds the directory pycairo.libs to the search path. Or it would be even better if delvewheel directly added those DLLs in cairo/ adjacent to the _cairo module. In that case, it would just work as it's adjacent to the module requiring it.

What are the concerns about including this logic in Pycairo?

Which logic do you mean? Is it about using cairo/_distributor_init.py for searching the PATH? In which case I don't like the idea of searching the PATH for finding the DLL and adding it to add_dll_directory being the default way (as it is against what is recommended by CPython developers).

danyeaw commented 1 year ago

Hi @naveen521kk, thanks for the response!

Sorry for not being clear. By "build-system/packager" I meant anyone who builds wheels or packages the application, not just OS packagers. Only they would know the exact path where to find the libcairo* DLL file and add the necessary code for it. Sometimes the build-system may know the path to find them and could add the necessary code.

In the Python ecosystem, the project that develops Python libraries and tools are the same people who distribute the sdist and wheel on PyPI. This is the 1st use case.

I know that this doesn't actually fix the issue, where, one can build pycairo from the source and expect it to run on Windows. That's the reason we provide wheels on that Platform.

But because these wheels are statically built, they only work for a very small set of use cases that pycairo is used for. They won't work for PyGObject development.

For application developers, I think they could build their own wheels and patch it using delvewheel or something similar and use cairo/_distributor_init.py as necessary.

I don't know of any other Python libraries that require developers to patch the library and rebuild it in order to do a pip install. This isn't a good developer experience.

Which logic do you mean? Is it about using cairo/_distributor_init.py for searching the PATH?

Yup, the logic in this PR.

In which case I don't like the idea of searching the PATH for finding the DLL and adding it to add_dll_directory being the default way (as it is against what is recommended by CPython developers).

What do you see as the recommendation by the CPython developers? The release notes for Python 3.8 specifically talk about adding this add dll function for this purpose. Would it be acceptable if we didn't search on the PATH at all and only looked for another environmental variable?