pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.19k stars 498 forks source link

1.23.1 build for distribution packaging broken #2628

Closed dvzrv closed 10 months ago

dvzrv commented 1 year ago

Describe the bug (mandatory)

Hi! I package this project for Arch Linux. With >= 1.23.0 the build is broken for distribution packaging, as we build against a system-provided mupdf and setup.py now sets a required build directory variable to None.

To Reproduce (mandatory)

export PYMUPDF_SETUP_MUPDF_BUILD=''
python -m build --wheel --no-isolation
* Getting build dependencies for wheel...
PyMuPDF-1.23.1/setup.py: ### Starting.
PyMuPDF-1.23.1/setup.py: __name__: 'setup'
PyMuPDF-1.23.1/setup.py: platform.platform(): 'Linux-6.4.12-arch1-1-x86_64-with-glibc2.38'
PyMuPDF-1.23.1/setup.py: platform.python_version(): '3.11.3'
PyMuPDF-1.23.1/setup.py: sys.executable: '/usr/bin/python'
PyMuPDF-1.23.1/setup.py: CPU bits: 64 sys.maxsize=9223372036854775807
PyMuPDF-1.23.1/setup.py: __file__: '/build/python-pymupdf/src/PyMuPDF-1.23.1/setup.py'
PyMuPDF-1.23.1/setup.py: os.getcwd(): '/build/python-pymupdf/src/PyMuPDF-1.23.1'
PyMuPDF-1.23.1/setup.py: sys.argv (3):
PyMuPDF-1.23.1/setup.py:     0: '/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py'
PyMuPDF-1.23.1/setup.py:     1: 'get_requires_for_build_wheel'
PyMuPDF-1.23.1/setup.py:     2: '/tmp/tmp5lmaec20'
PyMuPDF-1.23.1/setup.py: os.environ (33):
PyMuPDF-1.23.1/setup.py:     BUILDTOOL: 'devtools'
PyMuPDF-1.23.1/setup.py:     BUILDTOOLVER: '1:1.0.3-2-any'
PyMuPDF-1.23.1/setup.py:     CFLAGS: '-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/python-pymupdf/src=/usr/src/debug/python-pymupdf -flto=auto'
PyMuPDF-1.23.1/setup.py:     CHOST: 'x86_64-pc-linux-gnu'
PyMuPDF-1.23.1/setup.py:     COMMAND_MODE: 'legacy'
PyMuPDF-1.23.1/setup.py:     CXXFLAGS: '-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/python-pymupdf/src=/usr/src/debug/python-pymupdf -
flto=auto'
PyMuPDF-1.23.1/setup.py:     DEBUGINFOD_URLS: 'https://debuginfod.archlinux.org '
PyMuPDF-1.23.1/setup.py:     HOME: '/build'
PyMuPDF-1.23.1/setup.py:     LANG: 'C.UTF-8'
PyMuPDF-1.23.1/setup.py:     LDFLAGS: '-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto'
PyMuPDF-1.23.1/setup.py:     LOGNAME: 'builduser'
PyMuPDF-1.23.1/setup.py:     MAIL: '/var/mail/builduser'
PyMuPDF-1.23.1/setup.py:     MAKEFLAGS: '-j24'
PyMuPDF-1.23.1/setup.py:     OLDPWD: '/build/python-pymupdf/src'
PyMuPDF-1.23.1/setup.py:     PATH: '/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl'
PyMuPDF-1.23.1/setup.py:     PEP517_BACKEND_PATH: '/build/python-pymupdf/src/PyMuPDF-1.23.1'
PyMuPDF-1.23.1/setup.py:     PEP517_BUILD_BACKEND: 'setup'
PyMuPDF-1.23.1/setup.py:     PWD: '/build/python-pymupdf/src/PyMuPDF-1.23.1'
PyMuPDF-1.23.1/setup.py:     PYMUPDF_SETUP_MUPDF_BUILD: ''
PyMuPDF-1.23.1/setup.py:     PYTHONHASHSEED: '0'
PyMuPDF-1.23.1/setup.py:     RUSTFLAGS: ' -C debuginfo=2 --remap-path-prefix=/build/python-pymupdf/src=/usr/src/debug/python-pymupdf'
PyMuPDF-1.23.1/setup.py:     SHELL: '/bin/bash'
PyMuPDF-1.23.1/setup.py:     SHLVL: '1'
PyMuPDF-1.23.1/setup.py:     SOURCE_DATE_EPOCH: '1693223543'
PyMuPDF-1.23.1/setup.py:     SUDO_COMMAND: '/bin/bash -c bash -c cd\\ /startdir;\\ makepkg\\ "$@" -bash --syncdeps --noconfirm --log --holdver --skipinteg --install'
PyMuPDF-1.23.1/setup.py:     SUDO_GID: '0'
PyMuPDF-1.23.1/setup.py:     SUDO_UID: '0'
PyMuPDF-1.23.1/setup.py:     SUDO_USER: 'root'
PyMuPDF-1.23.1/setup.py:     TERM: 'xterm-256color'
PyMuPDF-1.23.1/setup.py:     TEXTDOMAIN: 'pacman-scripts'
PyMuPDF-1.23.1/setup.py:     TEXTDOMAINDIR: '/usr/share/locale'
PyMuPDF-1.23.1/setup.py:     USER: 'builduser'
PyMuPDF-1.23.1/setup.py:     _: '/usr/bin/python'
* Building wheel...
PyMuPDF-1.23.1/setup.py: ### Starting.
PyMuPDF-1.23.1/setup.py: __name__: 'setup'
PyMuPDF-1.23.1/setup.py: platform.platform(): 'Linux-6.4.12-arch1-1-x86_64-with-glibc2.38'
PyMuPDF-1.23.1/setup.py: platform.python_version(): '3.11.3'
PyMuPDF-1.23.1/setup.py: sys.executable: '/usr/bin/python'
PyMuPDF-1.23.1/setup.py: CPU bits: 64 sys.maxsize=9223372036854775807
PyMuPDF-1.23.1/setup.py: __file__: '/build/python-pymupdf/src/PyMuPDF-1.23.1/setup.py'
PyMuPDF-1.23.1/setup.py: os.getcwd(): '/build/python-pymupdf/src/PyMuPDF-1.23.1'
PyMuPDF-1.23.1/setup.py: sys.argv (3):
PyMuPDF-1.23.1/setup.py:     0: '/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py'
PyMuPDF-1.23.1/setup.py:     1: 'build_wheel'
PyMuPDF-1.23.1/setup.py:     2: '/tmp/tmpgeb4yx6g'
PyMuPDF-1.23.1/setup.py: os.environ (33):
PyMuPDF-1.23.1/setup.py:     BUILDTOOL: 'devtools'
PyMuPDF-1.23.1/setup.py:     BUILDTOOLVER: '1:1.0.3-2-any'
PyMuPDF-1.23.1/setup.py:     CFLAGS: '-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/python-pymupdf/src=/usr/src/debug/python-pymupdf -flto=auto'
PyMuPDF-1.23.1/setup.py:     CHOST: 'x86_64-pc-linux-gnu'
PyMuPDF-1.23.1/setup.py:     COMMAND_MODE: 'legacy'
PyMuPDF-1.23.1/setup.py:     CXXFLAGS: '-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/python-pymupdf/src=/usr/src/debug/python-pymupdf -
flto=auto'
PyMuPDF-1.23.1/setup.py:     DEBUGINFOD_URLS: 'https://debuginfod.archlinux.org '
PyMuPDF-1.23.1/setup.py:     HOME: '/build'
PyMuPDF-1.23.1/setup.py:     LANG: 'C.UTF-8'
PyMuPDF-1.23.1/setup.py:     LDFLAGS: '-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto'
PyMuPDF-1.23.1/setup.py:     LOGNAME: 'builduser'
PyMuPDF-1.23.1/setup.py:     MAIL: '/var/mail/builduser'
PyMuPDF-1.23.1/setup.py:     MAKEFLAGS: '-j24'
PyMuPDF-1.23.1/setup.py:     OLDPWD: '/build/python-pymupdf/src'
PyMuPDF-1.23.1/setup.py:     PATH: '/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl'
PyMuPDF-1.23.1/setup.py:     PEP517_BACKEND_PATH: '/build/python-pymupdf/src/PyMuPDF-1.23.1'
PyMuPDF-1.23.1/setup.py:     PEP517_BUILD_BACKEND: 'setup'
PyMuPDF-1.23.1/setup.py:     PWD: '/build/python-pymupdf/src/PyMuPDF-1.23.1'
PyMuPDF-1.23.1/setup.py:     PYMUPDF_SETUP_MUPDF_BUILD: ''
PyMuPDF-1.23.1/setup.py:     PYTHONHASHSEED: '0'
PyMuPDF-1.23.1/setup.py:     RUSTFLAGS: ' -C debuginfo=2 --remap-path-prefix=/build/python-pymupdf/src=/usr/src/debug/python-pymupdf'
PyMuPDF-1.23.1/setup.py:     SHELL: '/bin/bash'
PyMuPDF-1.23.1/setup.py:     SHLVL: '1'
PyMuPDF-1.23.1/setup.py:     SOURCE_DATE_EPOCH: '1693223543'
PyMuPDF-1.23.1/setup.py:     SUDO_COMMAND: '/bin/bash -c bash -c cd\\ /startdir;\\ makepkg\\ "$@" -bash --syncdeps --noconfirm --log --holdver --skipinteg --install'
PyMuPDF-1.23.1/setup.py:     SUDO_GID: '0'
PyMuPDF-1.23.1/setup.py:     SUDO_UID: '0'
PyMuPDF-1.23.1/setup.py:     SUDO_USER: 'root'
PyMuPDF-1.23.1/setup.py:     TERM: 'xterm-256color'
PyMuPDF-1.23.1/setup.py:     TEXTDOMAIN: 'pacman-scripts'
PyMuPDF-1.23.1/setup.py:     TEXTDOMAINDIR: '/usr/share/locale'
PyMuPDF-1.23.1/setup.py:     USER: 'builduser'
PyMuPDF-1.23.1/setup.py:     _: '/usr/bin/python'
pipcl.py: build_wheel():  wheel_directory='/build/python-pymupdf/src/PyMuPDF-1.23.1/dist' config_settings={} metadata_directory=None
pipcl.py: _call_fn_build(): calling self.fn_build=<function build at 0x7f19477885e0>
PyMuPDF-1.23.1/setup.py: skeleton=None
PyMuPDF-1.23.1/setup.py: PYMUPDF_SETUP_MUPDF_BUILD=''
PyMuPDF-1.23.1/setup.py: PYMUPDF_SETUP_MUPDF_BUILD="", using system mupdf
PyMuPDF-1.23.1/setup.py: Using system mupdf.
PyMuPDF-1.23.1/setup.py: build(): mupdf_build_dir=None
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
    main()
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 335, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
    return _build_backend().build_wheel(wheel_directory, config_settings,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 579, in build_wheel
    items = self._call_fn_build(config_settings)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 731, in _call_fn_build
    ret = self.fn_build()
          ^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/setup.py", line 694, in build
    path_so_leaf_a, path_so_leaf_b = _build_extensions(
                                     ^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/setup.py", line 953, in _build_extensions
    mupdf_build_dir_flags = os.path.basename( mupdf_build_dir).split( '-')
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen posixpath>", line 142, in basename
TypeError: expected str, bytes or os.PathLike object, not NoneType

ERROR Backend subprocess exited when trying to invoke build_wheel

Build script for Arch Linux: https://gitlab.archlinux.org/archlinux/packaging/packages/python-pymupdf/-/blob/main/PKGBUILD

Expected behavior (optional)

pymupdf can be build as before.

Screenshots (optional)

n/a

Your configuration (mandatory)

Additional context (optional)

The original setup.py has been renamed to setup-old.py which makes it very hard to track the changes as a new file has been put in place (which makes me wonder why changes have not just been added gradually).

julian-smith-artifex-com commented 1 year ago

Apologies, we don't have a good way to test building with a system mupdf so this was missed.

For me, this patch fixes the immediate problem:

diff --git a/setup.py b/setup.py
index acf27d3..9af7c29 100755
--- a/setup.py
+++ b/setup.py
@@ -950,7 +950,10 @@ def _build_extensions( mupdf_local, mupdf_build_dir, build_type):
     compiler_extra = ''
     if build_type == 'memento':
         compiler_extra += ' -DMEMENTO'
-    mupdf_build_dir_flags = os.path.basename( mupdf_build_dir).split( '-')
+    if mupdf_build_dir:
+        mupdf_build_dir_flags = os.path.basename( mupdf_build_dir).split( '-')
+    else:
+        mupdf_build_dir_flags = ''
     optimise = 'release' in mupdf_build_dir_flags
     debug = 'debug' in mupdf_build_dir_flags
     if windows:

Could you try this, and let us know if anything else is required to make things work for you?

BTW we are in the process of migrating to a new implementation of PyMuPDF called "rebased" which will use the MuPDF C++ and Python bindings, instead of the C bindings. So it will require a system MuPDF to have the C++ and Python bindings available. Building them is straightforward and reliable, but requires SWIG and Python package libclang.

dvzrv commented 1 year ago

Thanks! That gets me across the first bridge, but then I hit:

pipcl.py: _doit(): out='/build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i.c'
pipcl.py: _doit(): in='/build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i'
pipcl.py: run(): Running: swig
pipcl.py: run():     -Wall
pipcl.py: run():     -python
pipcl.py: run():     -module fitz
pipcl.py: run():     -outdir /build/python-pymupdf/src/PyMuPDF-1.23.1/fitz
pipcl.py: run():     -o /build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i.c
pipcl.py: run():     -MD -MF /build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i.c.d
pipcl.py: run():     /build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i
/build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i:66: Error: Unable to find 'mupdf/fitz/version.h'
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
    main()
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 335, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
    return _build_backend().build_wheel(wheel_directory, config_settings,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 579, in build_wheel
    items = self._call_fn_build(config_settings)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 731, in _call_fn_build
    ret = self.fn_build()
          ^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/setup.py", line 694, in build
    path_so_leaf_a, path_so_leaf_b = _build_extensions(
                                     ^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/setup.py", line 1048, in _build_extensions
    path_so_leaf_a = pipcl.build_extension(
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 1335, in build_extension
    run( f'''
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 1720, in run
    subprocess.run( command2, shell=True, check=check)
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'swig\
    -Wall\
    -python\
    -module fitz\
    -outdir /build/python-pymupdf/src/PyMuPDF-1.23.1/fitz\
    -o /build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i.c\
    -MD -MF /build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i.c.d\
    /build/python-pymupdf/src/PyMuPDF-1.23.1/fitz/fitz.i' returned non-zero exit status 1.

ERROR Backend subprocess exited when trying to invoke build_wheel
dvzrv commented 1 year ago

So I guess here I'd need to prefix with /usr/include/ as the header file in question is provided by our libmupdf package. Not super sure how to do that with swig in this context.

dvzrv commented 1 year ago

Hmm, after overriding/ setting includes I'm running into:

PyMuPDF-1.23.1/setup.py: Building PyMuPDF rebased.
pipcl.py: _doit(): out='/build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i.cpp'
pipcl.py: _doit(): in='/build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i'
pipcl.py: run(): Running: swig
pipcl.py: run():     -Wall
pipcl.py: run():     -c++
pipcl.py: run():     -python
pipcl.py: run():     -module extra
pipcl.py: run():     -outdir /build/python-pymupdf/src/PyMuPDF-1.23.1/src
pipcl.py: run():     -o /build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i.cpp
pipcl.py: run():     -MD -MF /build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i.cpp.d
pipcl.py: run():     /build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i
pipcl.py: run(): Running: /usr/bin/python3.11-config --includes
pipcl.py: run(): Running: /usr/bin/python3.11-config --ldflags
pipcl.py: __init__(): self.includes='-I/usr/include/python3.11 -I/usr/include/python3.11'
pipcl.py: __init__(): self.ldflags='-L/usr/lib  -ldl  -lm'
pipcl.py: _doit(): out='/build/python-pymupdf/src/PyMuPDF-1.23.1/src/_extra.so'
pipcl.py: _doit(): in='/build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i.cpp'
pipcl.py: run(): Running: c++
pipcl.py: run():     -fPIC
pipcl.py: run():     -shared
pipcl.py: run():     -I/usr/include/python3.11 -I/usr/include/python3.11
pipcl.py: run():     /build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i.cpp
pipcl.py: run():     -MD -MF /build/python-pymupdf/src/PyMuPDF-1.23.1/src/_extra.so.d
pipcl.py: run():     -o /build/python-pymupdf/src/PyMuPDF-1.23.1/src/_extra.so
pipcl.py: run():      -Wall -Wno-deprecated-declarations -Wno-unused-const-variable
pipcl.py: run():     -LNone
pipcl.py: run():     -lmupdf -lmupdfcpp
pipcl.py: run():     -Wl,-rpath,'$ORIGIN',-z,origin
pipcl.py: run():     -L/usr/lib  -ldl  -lm
/build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i.cpp:3165:10: fatal error: mupdf/classes2.h: No such file or directory
 3165 | #include "mupdf/classes2.h"
      |          ^~~~~~~~~~~~~~~~~~
compilation terminated.
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
    main()
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 335, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
    return _build_backend().build_wheel(wheel_directory, config_settings,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 579, in build_wheel
    items = self._call_fn_build(config_settings)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 731, in _call_fn_build
    ret = self.fn_build()
          ^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/setup.py", line 694, in build
    path_so_leaf_a, path_so_leaf_b = _build_extensions(
                                     ^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/setup.py", line 1097, in _build_extensions
    path_so_leaf_b = pipcl.build_extension(
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 1533, in build_extension
    run(command)
  File "/build/python-pymupdf/src/PyMuPDF-1.23.1/pipcl.py", line 1720, in run
    subprocess.run( command2, shell=True, check=check)
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'c++\
    -fPIC\
    -shared\
    -I/usr/include/python3.11 -I/usr/include/python3.11\
    /build/python-pymupdf/src/PyMuPDF-1.23.1/src/extra.i.cpp\
    -MD -MF /build/python-pymupdf/src/PyMuPDF-1.23.1/src/_extra.so.d\
    -o /build/python-pymupdf/src/PyMuPDF-1.23.1/src/_extra.so\
     -Wall -Wno-deprecated-declarations -Wno-unused-const-variable\
    -LNone\
    -lmupdf -lmupdfcpp\
    -Wl,-rpath,'$ORIGIN',-z,origin\
    -L/usr/lib  -ldl  -lm' returned non-zero exit status 1.

ERROR Backend subprocess exited when trying to invoke build_wheel

I guess this means libmupdf is missing something...

julian-smith-artifex-com commented 1 year ago

I think what's happening here is that by default we build the original "classic" implementation of PyMuPDF, which only uses the MuPDF C API, and also the new "rebased" implementation, which requires the MuPDF C++ and Python bindings.

In the medium/long term, only the rebased implementation will be available, so you'll need to extend your MuPDF installation to include the MuPDF C++ and Python bindings.

For now though, if you set PYMUPDF_SETUP_IMPLEMENTATIONS=a, it will only build the classic implementation, which will avoid it trying to use things from the C++ API such as mupdf/classes2.h.

julian-smith-artifex-com commented 1 year ago

One other thing, when building with a system MuPDF, it would presumably help you if setup.py allowed extra include paths and library paths to be specified with environment variables.

This could be as simple as adding $CFLAGS or $PYMUPDF_CFLAGS to all swig, compile and link commands, or perhaps $PYMUPDF_SWIG_FLAGS, $PYMUPDF_COMPILE_FLAGS and $PYMUPDF_LINK_FLAGS. Or maybe $PYMUPDF_INCLUDES and $PYMUPDF_LINK_FLAGS.

Do you have any suggestions or preferences here?

dvzrv commented 1 year ago

I think what's happening here is that by default we build the original "classic" implementation of PyMuPDF, which only uses the MuPDF C API, and also the new "rebased" implementation, which requires the MuPDF C++ and Python bindings.

Please do note, that mupdf upstream offers zero integration for either of the last two (I looked into updating the build system but failed to understand where and what of the files should actually be installed). There is not even an install target for those files... Are you sure you want to rely on that?

This could be as simple as adding $CFLAGS or $PYMUPDF_CFLAGS to all swig, compile and link commands, or perhaps $PYMUPDF_SWIG_FLAGS, $PYMUPDF_COMPILE_FLAGS and $PYMUPDF_LINK_FLAGS. Or maybe $PYMUPDF_INCLUDES and $PYMUPDF_LINK_FLAGS.

Do you have any suggestions or preferences here?

This is usually covered by exposing $CFLAGS, $CXXFLAGS and $LDFLAGS (those environment variables are set by default in our packaging environment).

mjg commented 1 year ago

What happened to good old setup.py build by the way? The new glory setup.py seems to provide build() but pipcl.py does not know it (Unrecognised arg: build). I have to say that replacing the old build script with a new non-standard one does quite help poor packagers' lives ...

dvzrv commented 1 year ago

Frankly, after looking into the new setup.py (and having a really hard time understanding what is even supposed to go on in there), I am really starting to wonder why this project does not rely on meson for building this (e.g. https://mesonbuild.com/Cython.html)?

dvzrv commented 1 year ago

I have arrived at this patch for 1.23.3:

diff --git i/setup.py w/setup.py
index c65df8d..7088e57 100755
--- i/setup.py
+++ w/setup.py
@@ -1003,8 +1003,22 @@ def _build_extensions( mupdf_local, mupdf_build_dir, build_type):
                     f'{mupdf_local}/thirdparty/freetype/include',
                     )
         else:
-            includes = None
-        
+            includes = (
+                '/usr/include',
+                '/usr/include/freetype2',
+                '/usr/include/mupdf',
+            )
+            libraries = [
+                "jbig2dec",
+                "openjp2",
+                "jpeg",
+                "mujs",
+                "freetype",
+                "gumbo",
+                "harfbuzz",
+            ]
+            linker_extra = '-ljbig2dec -lopenjp2 -lmujs -ljpeg -lfreetype -lgumbo -lharfbuzz -lm'
+
         # Update helper-git-versions.i.
         f = io.StringIO()
         f.write('%pythoncode %{\n')

However, now tests fail because of missing symbols, e.g. and I have no clue which lib they belong to...:

____________________ ERROR collecting tests/test_widgets.py ____________________
ImportError while importing test module '/build/python-pymupdf/src/PyMuPDF-1.23.3/tests/test_widgets.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/site-packages/_pytest/python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
/usr/lib/python3.11/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1204: in _gcd_import
    ???
<frozen importlib._bootstrap>:1176: in _find_and_load
    ???
<frozen importlib._bootstrap>:1147: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
/usr/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
tests/test_widgets.py:5: in <module>
    import fitz
test_dir/usr/lib/python3.11/site-packages/fitz/__init__.py:22: in <module>
    from fitz.fitz import *
test_dir/usr/lib/python3.11/site-packages/fitz/fitz.py:14: in <module>
    from . import _fitz
E   ImportError: /build/python-pymupdf/src/PyMuPDF-1.23.3/test_dir/usr/lib/python3.11/site-packages/fitz/_fitz.cpython-311-x86_64-linux-gnu.so: undefined symbol: extract_stroke_end

IMHO, this type of breakage should have and could have been prevented. The current state is quite the huge step back for anyone trying to package this downstream and it takes considerable amounts of time to even try to get this to work.

julian-smith-artifex-com commented 1 year ago

I think what's happening here is that by default we build the original "classic" implementation of PyMuPDF, which only uses the MuPDF C API, and also the new "rebased" implementation, which requires the MuPDF C++ and Python bindings.

Please do note, that mupdf upstream offers zero integration for either of the last two (I looked into updating the build system but failed to understand where and what of the files should actually be installed). There is not even an install target for those files... Are you sure you want to rely on that?

Yes, there's currently no top-level makefile install target for the C++/Python bindings, but we'll be adding one soon, e.g. make install-libs-all. [This will only be required when we finally switch to the rebased implementation.]

This could be as simple as adding $CFLAGS or $PYMUPDF_CFLAGS to all swig, compile and link commands, or perhaps $PYMUPDF_SWIG_FLAGS, $PYMUPDF_COMPILE_FLAGS and $PYMUPDF_LINK_FLAGS. Or maybe $PYMUPDF_INCLUDES and $PYMUPDF_LINK_FLAGS. Do you have any suggestions or preferences here?

This is usually covered by exposing $CFLAGS, $CXXFLAGS and $LDFLAGS (those environment variables are set by default in our packaging environment).

Ok, thanks for this, i'll have a look at using these, and let you know when i have something you can test.

dvzrv commented 1 year ago

Ok, thanks for this, i'll have a look at using these, and let you know when i have something you can test.

Thank you!

julian-smith-artifex-com commented 1 year ago

What happened to good old setup.py build by the way?

Unfortunately the old setup.py's use of setuptools made it incapable of doing what we require. And setuptools is deprecated these days. Obviously we would have preferred to use an existing build tool, but there does not seem to be anything capable enough - Python's support for building complex extensions seems very poor.

pipcl.py tries hard to follow all the available standards in the various PEPs and https://packaging.python.org documentation, and works well on our three main platforms - Linux, Windows and MacOS.

The new glory setup.py seems to provide build() but pipcl.py does not know it (Unrecognised arg: build). I have to say that replacing the old build script with a new non-standard one does quite help poor packagers' lives ...

setup.py is not non-standard - it follows the official standards for packaging, e.g. PEP-517. Running setup.py directly is deprecated these days, one is supposed to drive things with a frontend such as pip.

Having said that, we do try to support basic command-line usage, using https://pip.pypa.io/en/stable/reference/build-system/setup-py/ "setup.py (legacy)" as a reference. So ./setup.py bdist_wheel builds a wheel, ./setup.py install builds and intalls, etc.

I think setuptools' build builds and puts the generated files into directory build/. So it looks equivalent to running setup.py install --root build. I'll have a look at adding a build command to do this; do let me know if you think the behaviour should be different.

mjg commented 1 year ago

Right now I'm not sure what should be different - there simply are too many changes already (setup.py/pipcl here, plus the new rebasing, plus the new version) and in python (distutils/setup tools, py3.12) in a relatively short timeframe. I'll try to work with what's there - the info that you're trying to follow PEP was new to me, for example. This may help to switch to the newer Fedora packaging macros.

julian-smith-artifex-com commented 1 year ago

Frankly, after looking into the new setup.py (and having a really hard time understanding what is even supposed to go on in there), I am really starting to wonder why this project does not rely on meson for building this (e.g. https://mesonbuild.com/Cython.html)?

I'm not familiar with meson. I guess that there are many different build systems around, with various pros and cons, and we all have our preferred ways of doing things.

One of the good things about Python's use of a setup.py script is that it's written in a powerful and generic language, so we can directly download MuPDF and run commands to build it, run SWIG to generate the PyMuPDF extensions, and package things up into Python wheels and/or copy into an installation directory.

Most of that is not really a build system, and i'm not sure that something like meson or cmake would be better than the pure Python code in setup.py and pipcl.py.

The intention is that setup.py and pipcl.py can be understood by reading the code and documentation. It looks like this might not be the case though. Can you describe what's lacking for you? I'd be happy to add something, e.g. to the top of setup.py, or in a separate document, that's more aligned with what you're primarily looking for.

julian-smith-artifex-com commented 1 year ago

Right now I'm not sure what should be different - there simply are too many changes already (setup.py/pipcl here, plus the new rebasing, plus the new version) and in python (distutils/setup tools, py3.12) in a relatively short timeframe. I'll try to work with what's there - the info that you're trying to follow PEP was new to me, for example. This may help to switch to the newer Fedora packaging macros.

Understood.

For now i'd suggest you ignore the new rebased implementation - set PYMUPDF_SETUP_IMPLEMENTATIONS=a when building.

mjg commented 1 year ago

Progress due to remarks by both of you :) @dvzrv note that libs is basically what we used to patch: tesseract in there gets turned into -ltesseract by pipcl. We don't need libraries. @julian-smith-artifex-com Different distros package mupdf differently: static vs. dynamic (maybe, are there any so builds?), bundling or unbundling some of the libraries which mupdf bundles by default. That is why there was that enormous special casing for libraries in different distros in the old setup.py. I guess we'd be happy to pass pkgconfig output or a list of libs into pipcl or set an env variable for that.

Re so: I seem to recall that there used to be issues with soname versioning in mupdf at some point, which is why we build statically. So, this is one more thing to change before we can ship the new mupdf buildings and use them for the rebased PyMuPDF.

dvzrv commented 1 year ago

One of the good things about Python's use of a setup.py script is that it's written in a powerful and generic language, so we can directly download MuPDF and run commands to build it, run SWIG to generate the PyMuPDF extensions, and package things up into Python wheels and/or copy into an installation directory.

Similar to swig, meson offers a PEP517 build backend. Using it has the upside, that you are using a tool that is actually made for building C/C++ software. Scripting all of this in setup.py (and pipcl.py) leads to a huge script, that is brittle and (currently) lacks all the common use-cases (e.g. passing in of build flags).

Most of that is not really a build system, and i'm not sure that something like meson or cmake would be better than the pure Python code in setup.py and pipcl.py.

All of that is part of a good build system! Please have a look at the generic integration for subprojects and the wrap dependency system.

Meson's declarative syntax is easy to read, has standardized functionality, powerful integration for dependencies, is tested across many platforms and therefore by all means better than custom Python code written for a single project.

The intention is that setup.py and pipcl.py can be understood by reading the code and documentation. It looks like this might not be the case though. Can you describe what's lacking for you? I'd be happy to add something, e.g. to the top of setup.py, or in a separate document, that's more aligned with what you're primarily looking for.

Sorry, but having to write thousands of lines of custom Python code to build a project (and forcing others to read it) sounds a lot like NIH to me. :( People know how to read a CMakeLists.txt or a meson.build file (all keywords are documented, there is no ambiguity). With a custom setup.py and accompanying scripts they have to read the entire thing to even figure out what is overriden where, etc. (because nothing in it will adhere to any standard whatsoever). As a downstream packager with hundreds of packages to maintain, this is a huge concern to me. I simply do not have the (unpaid) time to read through all of this to figure out in which code path my use-case probably is in.

To comment on what @mjg has been writing earlier: The new way of building (even the a variant) seems to depend on different things than before. As noted in https://github.com/pymupdf/PyMuPDF/issues/2628#issuecomment-1705626195 I have to provide other/ more libraries to the build and package those too. In this case extract, which appears to be bundled with mupdf and is not something we currently debundle because it is not used by anything else. Additionally, our libmupdf package currently only provides staticlibs (probably for the same reason as mentioned by @mjg).

It would be very helpful to first clear this up (and even add standardized integration, such as pkgconfig and/or cmake files to e.g. extract and mupdf) so that detection can happen generically across consumers and nothing has to be hardcoded by thousands of lines of custom Python code.

mjg commented 1 year ago

So, just in case this helps someone: Here is the Fedora dist-git tree which builds and passes the test suite. It contains the ugly version of patching in the required libraries and requesting a debug build for non-local builds. I'm still a bit worried about not picking up default build flags from the environment - we usually require that in Fedora land.

The other two patches (to the test suite) are not new.

Test builds can be found in my copr repo.

julian-smith-artifex-com commented 1 year ago

So, just in case this helps someone: Here is the Fedora dist-git tree which builds and passes the test suite. It contains the ugly version of patching in the required libraries and requesting a debug build for non-local builds. I'm still a bit worried about not picking up default build flags from the environment - we usually require that in Fedora land.

Thanks for this.

I've pushed some experimental MuPDF and PyMuPDF changes to:

https://github.com/ArtifexSoftware/PyMuPDF-julian
https://github.com/ArtifexSoftware/mupdf-julian

These allow system build and install of MuPDF and PyMuPDF.

As a test, one can install MuPDF and PyMuPDF into ~/fake-root with:

# Get code.
git clone https://github.com/ArtifexSoftware/mupdf-julian mupdf
git clone https://github.com/ArtifexSoftware/PyMuPDF-julian PyMuPDF

# Build and install MuPDF.
cd mupdf && DESTDIR=~/fake-root make install-shared-python && cd ..

# Build and install PyMuPDF with either of these:

PYMUPDF_MUPDF_INCLUDE=/home/jules/artifex-remote/pymupdf-fake-root/usr/local/include PYMUPDF_MUPDF_LIB=~/fake-root/usr/local/lib PYMUPDF_SETUP_MUPDF_BUILD='' pip install --root ~/fake-root ./PyMuPDF

PYMUPDF_MUPDF_INCLUDE=/home/jules/artifex-remote/pymupdf-fake-root/usr/local/include PYMUPDF_MUPDF_LIB=~/fake-root/usr/local/lib PYMUPDF_SETUP_MUPDF_BUILD='' setup.py install --root ~/fake-root 

Then test with:

LD_LIBRARY_PATH=~/fake-root/usr/local/lib PYTHONPATH=~/fake-root/usr/local/lib/python3.11/dist-packages/ python3

and check that import fitz and import fitz_new work.

PyMuPDF's setup.py supports new environment variables CFLAGS, CXXFLAGS, LDFLAGS, PYMUPDF_INCLUDES, PYMUPDF_MUPDF_INCLUDE, PYMUPDF_MUPDF_LIB. See docs near top of setup.py.

I've only tested PYMUPDF_MUPDF_INCLUDE and PYMUPDF_MUPDF_LIB so far, and i haven't tried making MuPDF use system-installed things such as freetype.

julian-smith-artifex-com commented 1 year ago

@dvzrv wrote:

Similar to swig, meson offers a PEP517 build backend. Using it has the upside, that you are using a tool that is actually made for building C/C++ software. Scripting all of this in setup.py (and pipcl.py) leads to a huge script, that is brittle and (currently) lacks all the common use-cases (e.g. passing in of build flags).

This is getting a little off-topic for this issue, so i've replied in a new discussion: #2649

julian-smith-artifex-com commented 1 year ago

I've pushed changes to MuPDF and PyMuPDF that will hopefully improve system installs.

Here are example commands i've used to install and test using ~/fake-root.

(cd mupdf && DESTDIR=~/fake-root make USE_SYSTEM_LIBS=yes HAVE_LEPTONICA=yes HAVE_TESSERACT=yes verbose=yes install-shared-c)

PYMUPDF_SETUP_MUPDF_BUILD= PYMUPDF_SETUP_IMPLEMENTATIONS=a CFLAGS='-I ~/fake-root/usr/local/include -I/usr/include/freetype2 -I/usr/include/libpng16' CXXFLAGS='-I ~/fake-root/usr/local/include -I/usr/include/freetype2 -I/usr/include/libpng16' LDFLAGS='-L ~/fake-root/usr/local/lib' pip install --root ~/fake-root ./PyMuPDF

LD_LIBRARY_PATH=~/fake-root/usr/local/lib PYTHONPATH=~/fake-root/usr/local/lib/python3.11/dist-packages pytest -k "not test_color_count" PyMuPDF

I haven't tested installing into the real system root, but the intention is that one can do:

(cd mupdf && DESTDIR=/ make USE_SYSTEM_LIBS=yes HAVE_LEPTONICA=yes HAVE_TESSERACT=yes verbose=yes install-shared-c)

PYMUPDF_SETUP_MUPDF_BUILD= PYMUPDF_SETUP_IMPLEMENTATIONS=a pip install ./PyMuPDF

pytest -k "not test_color_count" PyMuPDF
julian-smith-artifex-com commented 1 year ago

Today's release, PyMuPDF-1.23.4, includes the changes described above.

Let me know how you get on with building and installing system packages.

mjg commented 1 year ago

Thanks. I understand this is intended to make distro life smoother in the future and I do appreciate that.

That being said: 1.23.4 does not build with the same spec file with which 1.23.3 built (and which I had recrafted for the changes in 1.23.3), nor does it build if I remove an additional patch to setup.py that 1.23.3 still needed. Maybe I just need to cull lib related changes from that patch (and set the new env variables) but keep the mupdf_build_dir_flags change (so that setup.py does not clear it without local mupdf sources), or something else. I won't be able to invest the necessary time in the upcoming weeks, so please don't be disappointed if feedback does not arrive soon. It's too late for this to make it into the Fedora 39 release anyways, and - consequently - for 1.23.x at this point because it changed again rather than stabilizing.

What is the intended incarnation to install PyMuPDF with a system mupdf? In your example above, you still have a mupdf src tree alongside the PyMuPDF src tree (which prevents the clearing of mupdf_build_dir_flags in setup.py that I'm patching out)

julian-smith-artifex-com commented 1 year ago

I totally understand about not being able to update PyMuPDF for Fedora 39, and thanks for the warning about no feedback for a while. I'll jump on this issue whenever you're able to look at it some more.

[I guess your 0001-unbreak-build-against-system-libraries.patch will not work with 1.23.4's setup.py, as that code has changed quite a bit. Apologies for breaking things here.]

The intention with 1.23.4 is that you can build without needing to do any patching of setup.py. And i'll make any required changes to ensure this is the case.

What is the intended incarnation to install PyMuPDF with a system mupdf? In your example above, you still have a mupdf src tree alongside the PyMuPDF src tree (which prevents the clearing of mupdf_build_dir_flags in setup.py that I'm patching out)

The example i gave is intended to do this.

Just for clarity, here's my suggested (and untested) commands for a system install, with a couple of extra annotations:

# Install MuPDF.
(cd mupdf && DESTDIR=/ make USE_SYSTEM_LIBS=yes HAVE_LEPTONICA=yes HAVE_TESSERACT=yes verbose=yes install-shared-c)

# At this point we could do `rm -r mupdf/` because it is not be required when building+installing PyMuPDF.

# Build+install PyMuPDF using above system install of MuPDF.
# Set CFLAGS, CXXFLAGS and LDFLAGS as required, e.g. with `pkg-config --cflags freetype2`.
PYMUPDF_SETUP_MUPDF_BUILD= PYMUPDF_SETUP_IMPLEMENTATIONS=a pip install ./PyMuPDF

# Test PyMuPDF:
pytest -k "not test_color_count" PyMuPDF

I hope that all makes sense.

mjg commented 1 year ago

Yes, but _extension_flags() in setup.py checks for the presence of a source tree and resets mupdf_build_dir_flags to '' rather than build_type (around line 1085 now). The pure presence of the tree makes a difference here, and this is what I probably still need to patch, in addition to setting the flags differently. (Try removing the mupdf src tree before building PyMuPDF in the fakeroot.)

julian-smith-artifex-com commented 1 year ago

You're right that _extension_flags() (incorrectly) sets mupdf_build_dir_flags to ''. This means that we currently build PyMuPDF's extension module without optimisation when doing a system install.

But i don't think we look for a mupdf/ source tree - both mupdf_local and mupdf_build_dir are None when doing a system install, so any attempt to use them would raise an exception. I've verified that moving mupdf/ out of the way before building PyMuPDF still works.

I've fixed the setting of mupdf_build_dir_flags in my tree with this patch, which i'll push to the main PyMuPDF git repository in the next few minutes:

diff --git a/setup.py b/setup.py
index 4563dc1..c814341 100755
--- a/setup.py
+++ b/setup.py
@@ -1086,3 +1086,3 @@ def _extension_flags( mupdf_local, mupdf_build_dir, build_type):
     else:
-        mupdf_build_dir_flags = ''
+        mupdf_build_dir_flags = [build_type]
     optimise = 'release' in mupdf_build_dir_flags
mjg commented 1 year ago

OK, once I understood that /usr/include is not even in the standard include search path (and didn't misunderstand that error message any more) things became easier. Building again :) I do think that specifying extra includes and libs via flags is easier in the long run since it avoids patching. Thanks!

dvzrv commented 10 months ago

I was able to fix the build (also by changing the way we build mupdf) - see https://gitlab.archlinux.org/archlinux/packaging/packages/python-pymupdf/-/blob/8333ee1f0a9ad99d5cb1368118e6a7027b93d055/PKGBUILD. Thanks for the help @mjg @julian-smith-artifex-com

julian-smith-artifex-com commented 10 months ago

Thanks for this, i'm glad the build is working for you.

Please post here or create a new Github issue if you have any problems or suggestions in future, and i'll do my best to help.

The switch to the rebased implementation will probably require some further changes for system installs because the MuPDF installation will need to contain the MuPDF Python bindings. As mentioned it all works fine in a test case, but i'll get in touch anyway.

julian-smith-artifex-com commented 9 months ago

Please see new discussion about draft documentation for Linux system installs: #2977