pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.55k stars 448 forks source link

MSYS2/MINGW64 package only has fitz_old #3229

Open sdbbs opened 4 months ago

sdbbs commented 4 months ago

Description of the bug

Just wanted to report this - I think I do not need further support on this for now: I have been using import fitz in a Python3 script under MINGW64 (MSYS2) on Windows 10 last in January 2024, when it worked without problems.

I've tried running it today, and it fails with:

> import fitz # pymupdf
> ^^^^^^^^^^^
> ModuleNotFoundError: No module named 'fitz'

After a bit of searching, I found where the library is apparently located:

$ ls C:/msys64/mingw64/lib/python3.11/site-packages/PyMuPDF-1.23.26.dist-info/
COPYING  entry_points.txt  METADATA  README.md  RECORD  WHEEL

So, no Python files here; apparently those are here:

$ ls C:/msys64/mingw64/lib/python3.11/site-packages/fitz_old/
__pycache__/  __init__.py  __main__.py  _fitz_old.cp311-mingw_x86_64.pyd*  fitz_old.py  table.py  utils.py

So apparently the old fitz was just renamed fitz_old - but there is no fitz_new.

This seems to be confirmed in https://packages.msys2.org/package/mingw-w64-x86_64-python-pymupdf?repo=mingw64 ( unfortunately that page is not versioned, but it refers to https://mirror.msys2.org/mingw/mingw64/mingw-w64-x86_64-python-pymupdf-1.23.26-1-any.pkg.tar.zst ) - listed Files are:

/mingw64/bin/pymupdf.exe
/mingw64/lib/python3.11/site-packages/PyMuPDF-1.23.26.dist-info/COPYING
/mingw64/lib/python3.11/site-packages/PyMuPDF-1.23.26.dist-info/METADATA
/mingw64/lib/python3.11/site-packages/PyMuPDF-1.23.26.dist-info/README.md
/mingw64/lib/python3.11/site-packages/PyMuPDF-1.23.26.dist-info/RECORD
/mingw64/lib/python3.11/site-packages/PyMuPDF-1.23.26.dist-info/WHEEL
/mingw64/lib/python3.11/site-packages/PyMuPDF-1.23.26.dist-info/entry_points.txt
/mingw64/lib/python3.11/site-packages/fitz_old/__init__.py
/mingw64/lib/python3.11/site-packages/fitz_old/__main__.py
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/__init__.cpython-311.opt-1.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/__init__.cpython-311.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/__main__.cpython-311.opt-1.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/__main__.cpython-311.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/fitz_old.cpython-311.opt-1.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/fitz_old.cpython-311.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/table.cpython-311.opt-1.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/table.cpython-311.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/utils.cpython-311.opt-1.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/__pycache__/utils.cpython-311.pyc
/mingw64/lib/python3.11/site-packages/fitz_old/_fitz_old.cp311-mingw_x86_64.pyd
/mingw64/lib/python3.11/site-packages/fitz_old/fitz_old.py
/mingw64/lib/python3.11/site-packages/fitz_old/table.py
/mingw64/lib/python3.11/site-packages/fitz_old/utils.py
/mingw64/share/licenses/python-pymupdf/COPYING

For now import fitz_old as fitz works as a workaround - but I'm not sure how long that will last ....

How to reproduce the bug

Steps to reproduce - in MINGW64 shell, on MSYS2, Windows 10:

$ python3 -c 'import fitz'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'fitz'

Workaround is fine for now:

$ python3 -c 'import fitz_old as fitz; print(fitz.VersionBind)'
1.23.26

PyMuPDF version

1.23.26

Operating system

Other

Python version

3.11

julian-smith-artifex-com commented 4 months ago

It looks like whoever is creating that mingw package is building only the old (classic) implementation of PyMuPDF. Since version 1.23.9 it will build as fitz_old, as you've noticed, and the new rebased implementation builds as fitz.

You'll have to contact the mingw packagers to encourage them to build the rebased implementation, as we'll be completely dropping the classic implementation (fitz_old) fairly soon.

Feel free to send them here, in case we can help them with building rebased.

Biswa96 commented 4 months ago

Hi, I am the maintainer of mupdf package in msys2. pymupdf was set to build fitz_old because the mupdf C++ library can not be build. It shows linking error as following.

LINK build/shared-release/mutool
ld.exe: cannot find -lmupdf: No such file or directory
collect2.exe: error: ld returned 1 exit status

All the build command can be found in the following links:

I am trying to figure out the issue. For some reason, mupdf was built twice

$ ls build/
release  shared-release
Biswa96 commented 4 months ago

Also, mupdf build system is not compatible with mingw environment. It is barely working in mingw environment with multiple patches. For example,

julian-smith-artifex-com commented 4 months ago

I might be able to take a look at building for mingw, though i don't have that much time available so it might take a while.

Could you tell me what i need to install in order to reproduce the problem and hopefully find a fix?

Biswa96 commented 4 months ago
  1. Install msys2 with the procedure from https://www.msys2.org/ (follow all the steps there).
  2. In UCRT64, install compiler toolchain and required dependencies. For example:
pacman -Syy \
  $MINGW_PACKAGE_PREFIX-gumbo-parser \
  $MINGW_PACKAGE_PREFIX-python \
  $MINGW_PACKAGE_PREFIX-cc git base-devel
  1. do the normal development procedure with mupdf or pymupdf.

Please feel free to ask any question.

julian-smith-artifex-com commented 4 months ago

Thanks. I already have cygwin installed, can msys2 coexist with it? I wouldn't want to mess up the cygwin installation.

Biswa96 commented 3 months ago

Yeah, msys2 and cygwin can coexist. For typical installation, cygwin will be in C:\Cygwin64 and msys2 will be in C:\msys64.

julian-smith-artifex-com commented 3 months ago

I've managed to get a MuPDF shared library build working in my tree, but cannot build PyMuPDF.

This is because unfortunately on msys2 the libclang Python package (a Python interface on to the clang C/C++ parser, from pip install libclang) does not work. After import clang.cindex, index = clang.cindex.Index.create() fails because it cannot load libclang.so; there is no *clang* library in the package. It's possible that one could point libclang to a libclang.so or libclang.dll library if one was available, but this would be quite fragile, we really need it to be part of libclang.

Without libclang, it is impossible to build the latest PyMuPDF.

If there was a libclang package available at a msys2 system level with pacman, perhaps the build could be made to work (this works on OpenBSD with its py3-llvm package). I have done a quick search with pacman -sS, but couldn't find anything relevant.

Biswa96 commented 3 months ago

But archlinux does not use any clang library for pymupdf https://archlinux.org/packages/extra/x86_64/python-pymupdf/

Biswa96 commented 3 months ago

If you can upstream the mupdf changes, I can try to build pymupdf with that.

julian-smith-artifex-com commented 3 months ago

It looks like archlinux PyMuPDF uses separate libmupdf and python-mupdf packages. These contain the MuPDF C++ and Python bindings, and must have been built with Python libclang or a system-package equivalent.

My MuPDF patch is not in a state suitable for pushing to master. But i could email it to you if you'd like to try it out?

Biswa96 commented 3 months ago

The python bindings are not installed in clang mingw package. I have asked to add those and hope it will be fixed soon.

Biswa96 commented 3 months ago

python bindings are added with new mingw clang 18 packages. Please use pacman -Syyu command to update packages. Though, it may take some time to reach all the mirrors.

julian-smith-artifex-com commented 3 months ago

BTW i've just pushed a fix to mupdf master to allow things to work with clang-18 (which has slightly different behaviour) when building the C++ bindings.

So hopefully this will allow things to work with mingw's clang 18.

Biswa96 commented 3 months ago

The shared library can be built now but there are still many issues.

  1. The shared library extension should be .dll instead of .so.
  2. The import library is empty. Its name should be libmupdf.dll.a instead of libmupdf_dll.a.
  3. c++ library building still fails because the python script tries to find Visual Studio.

#1 can be fixed by adding "Windows_NT" with "MINGW" string where SO variable is set. #2 can be fixed by adding Wl,--out-implib=libmupdf.dll.a linker flag with DLL building command. #3 ╮(︶︿︶)╭

julian-smith-artifex-com commented 1 week ago

Apologies, i don't have time right now to attempt to port MuPDF and PyMuPDF to msys2.

I'd be glad to merge any patches that make things work, of course. I've kept msys2 on my todo list so it's possible i'll have a look sometime.