pypdfium2-team / pypdfium2

Python bindings to PDFium
https://pypdfium2.readthedocs.io/
425 stars 17 forks source link

Incompatibility with python 3.8.1 #76

Closed frgfm closed 2 years ago

frgfm commented 2 years ago

Hello there :wave:

Thank you for your wonderful here :pray: It's great to find good alternatives to PyMuPDF with a proper open-source license!

I figured I should share a problem that I had: the library (installed with pypi) doesn't work on Python 3.8.1. I have reproduced this in a clean environment using docker. Using the following Dockerfile

FROM python:3.8.1-slim

ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

RUN pip install --upgrade pip setuptools wheel \
    && pip install pypdfium2 \
    && pip cache purge \
    && rm -rf /root/.cache/pip

This command:

docker build . -t pypdfium2-py3.8.1-slim
docker run pypdfium2-py3.8.1-slim python -c "import pypdfium2"

yields:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/pypdfium2/__init__.py", line 7, in <module>
    from pypdfium2._namespace import *
  File "/usr/local/lib/python3.8/site-packages/pypdfium2/_namespace.py", line 6, in <module>
    from pypdfium2._pypdfium import *
  File "/usr/local/lib/python3.8/site-packages/pypdfium2/_pypdfium.py", line 1245, in <module>
    FPDF_LoadDocument.argtypes = [FPDF_STRING, FPDF_BYTESTRING]
TypeError: item 1 in _argtypes_ passes a union by value, which is unsupported.

I tried in 3.8.10 and 3.8.12, the problem doesn't arise with those later versions :+1:

mara004 commented 2 years ago

Hello @frgfm,

Thanks, it's encouraging to hear some people find this project useful! Concerning the issue you reported, I'm sorry I don't have much of an idea. The traceback goes down into the bindings file created by ctypesgen, which is mostly a black box for me. Considering that the issue only occurs with an older release of the 3.8 series but not with more recent ones, it might even be caused by a Python bug. That said, you could report the issue at ctypesgen, or I can also do it for you and link this thread if you want. Maybe they know more.

mara004 commented 2 years ago

Okay, so this is a known ctypesgen/python issue. According to https://github.com/ctypesgen/ctypesgen/issues/77 and https://github.com/python/cpython/pull/16799#issuecomment-612353119, only Python 3.7.6 and 3.8.1 are affected. I will blacklist these versions in my setup configuration.

frgfm commented 2 years ago

Thanks for your investigation & handling of this matter @mara004 :pray: Always greatly appreciated when a project can get around problems like this in a smooth way!

mara004 commented 2 years ago

You're welcome! Thank you for informing me of the issue.

mara004 commented 1 year ago

@frgfm Coming back to this: I've submitted a PR to ctypesgen which would fix this problem: https://github.com/ctypesgen/ctypesgen/pull/162

I hope it will be merged eventually. In the meantime, I've pinned pypdfium2 to a fork with this patch included, so if all goes well, the next release will be compatible with Python 3.7.6 and 3.8.1 (and have much nicer string handling).

mara004 commented 1 year ago

I'm afraid I was a bit fast with this. Probably this will have to be delayed until the release of v4 (i. e. moved into the devel branch). Reason is that the string handling changes removed implicit UTF-8 encoding. While helper classes always encode strings explicitly, a caller of the raw API might not.

In theory, I might also be able to create a different patch that retains a cut-down version of the string helper class with implicit encoding for backwards compatibility purposes (which would probably be regarded as less controversial at ctypesgen). But on the other hand, I'm of the opinion that implicit UTF-8 encoding is bad and would prefer to remove it in pypdfium2 and ctypesgen altogether.