pypdfium2-team / pypdfium2

Python bindings to PDFium
https://pypdfium2.readthedocs.io/
425 stars 17 forks source link

pdfium fails to load in PHP on Almalinux 8.9 (PartitionAlloc check failure) #292

Closed MarkCarbonell98 closed 9 months ago

MarkCarbonell98 commented 9 months ago

Checklist

Reason for Generic issue (keyword/topic)

Pypdfium2 might not work when installed with pip on almalinux server

Description

Versions:

cat /etc/os-release
NAME="AlmaLinux"
VERSION="8.9 (Midnight Oncilla)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="AlmaLinux 8.9 (Midnight Oncilla)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:8::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-8"
ALMALINUX_MANTISBT_PROJECT_VERSION="8.9"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.9"

Python version:

pip3.11 --version
pip 23.3.1 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)
python3.11 --version
Python 3.11.5

Installation made with: pip3.11 install --no-build-isolation -U pypdfium2

PHP version:

php --version
PHP 8.0.30 (cli) (built: Jan 17 2024 00:39:59) ( NTS )
Copyright (c) The PHP Group
Zend Engine v4.0.30, Copyright (c) Zend Technologies
    with Zend OPcache v8.0.30, Copyright (c), by Zend Technologies

How to reproduce:

  1. Install pypdfium2 with pip pip3.11 install --no-build-isolation -U pypdfium2
  2. Import and use it on a python script to read a file. (eg. python-test.py)
    import pypdfium2 as pdfium
    file_name = './some_file.pdf'
    pdf = pdfium.PdfDocument(file_name)
    n_pages = len(pdf)
    for n in range(n_pages):
      page = pdf[n]
      textpage = page.get_textpage()
      text_all = textpage.get_text_range()
      text += text_all
    print('[pdf parsed with: pypdfium]', text)
  3. Call this script from a PHP script (eg. php-test.php)
    echo shell_exec('./python-test.py');
  4. Call php-test.php by doing php php-test.php and observe the error coming from the pdfium binary:
    13 [FATAL:partition_address_space.cc(77)]
    12  Check failed: false. #00 0x7f7126e755e2 (<your-usr-dir>/.local/lib/python3.11/site-packages/pypdfium2_raw/libpdfium.so+0x4955e2)
    11   #01 0x7f7126e79003 (<your-usr-dir>/.local/lib/python3.11/site-packages/pypdfium2_raw/libpdfium.so+0x499003)
    10   #02 0x7f7126e78ef3 (<your-usr-dir>/.local/lib/python3.11/site-packages/pypdfium2_raw/libpdfium.so+0x498ef3)
    9   #03 0x7f7126e7c799 (<your-usr-dir>/.local/lib/python3.11/site-packages/pypdfium2_raw/libpdfium.so+0x49c799)
    8   #04 0x7f7126e79191 (<your-usr-dir>/.local/lib/python3.11/site-packages/pypdfium2_raw/libpdfium.so+0x499191)
    7   #05 0x7f7126c14eb4 (<your-usr-dir>/.local/lib/python3.11/site-packages/pypdfium2_raw/libpdfium.so+0x234eb4)
    6   #06 0x7f7126c14e06 (<your-usr-dir>/.local/lib/python3.11/site-packages/pypdfium2_raw/libpdfium.so+0x234e06)
    5   #07 0x7f7126d40997 (<your-usr-dir>/.local/lib/python3.11/site-packages/pypdfium2_raw/libpdfium.so+0x360997)
    4   #08 0x7f7127e1e17e (/usr/lib64/libffi.so.6.0.2+0x617e)
    3   #10 0x7f7127e1db2f (/usr/lib64/libffi.so.6.0.2+0x5b2f)
    2   #11 0x7f712802de08 (/usr/local/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so+0xce08)
    1   #12 0x7f7128032fcf (/usr/local/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so+0x11fcf)
    14    #09 0x7ffe300f1cf0 ([stack]+0x8d05d87cf0)
mara004 commented 9 months ago

I think this will be an upstream issue - looks like a PartitionAlloc assertion failed (probably while attempting to load the library?). It reminds me of https://github.com/pypdfium2-team/pypdfium2/issues/154, which was a PartitionAlloc permission error with Docker. Feel free to report this upstream.

In the meantime, you could try to build pdfium from source without PartitionAlloc and craft your own pypdfium2 wheel with the resulting binary. The syslibs build is PartitionAlloc-disabled, or you could patch "pdf_use_partition_alloc": False into the default config.

Example:

sudo apt-get install -y libfreetype-dev liblcms2-dev libjpeg-dev libopenjp2-7-dev libpng-dev zlib1g-dev libicu-dev libtiff-dev
python3 ./setupsrc/pypdfium2_setup/build_pdfium.py --use-syslibs
PDFIUM_PLATFORM="sourcebuild" python3 -m build --wheel -nx
python3 -m pip install -v dist/pypdfium2-*-py3-none-linux_x86_64.whl
pypdfium2 -v  # output should include the "sourcebuild" keyword
mara004 commented 9 months ago

I cannot reproduce the issue with latest pdfium on Fedora 37, so it seems to be specific to your setup.

mara004 commented 9 months ago

BTW, theoretically this was the wrong issue template - I'd have intended you to use "PyPA install" rather than "Generic issue". But the report included all I needed to see, so nevermind. It's a common mistake. If you have an idea how to improve the templates wording, that might be helpful.

mara004 commented 9 months ago

FATAL:partition_address_space.cc(77)

I think this is the file/line in question: https://chromium.googlesource.com/chromium/src/base/allocator/partition_allocator/+/fc98a25dea359f8d1f15e57d9278d81da73c7a09/src/partition_alloc/partition_address_space.cc#77

mara004 commented 9 months ago

@MarkCarbonell98 Does python-test.py pass when called directly, i.e. not through PHP?

mara004 commented 9 months ago

Pdfium binaries are now built without partitionalloc, so I'll close this issue: https://github.com/bblanchon/pdfium-binaries/issues/148 It'd be nice if you could test & report back with the next release, which is scheduled for approx 2 weeks from now.

MarkCarbonell98 commented 9 months ago

@mara004 when executing only python-test.py the script works. But when I call it through PHP it creates this error

mara004 commented 9 months ago

Hmm. Are you aware of any memory/allocation-related restrictions in PHP?

I experimented with your samples (calling a python script from PHP), but nothing seemed to happen: when adding prints I didn't see any output, and sleeps didn't result in a delay.

The latest pdfium-binaries release should be PartitionAlloc-free, so you could try to install pypdfium2 from source (git main), which will implicitly download the latest binary, and see if it works now.