pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.54k stars 447 forks source link

test_q_count fails with v1.24.3 #3460

Closed Antiz96 closed 1 month ago

Antiz96 commented 1 month ago

Description of the bug

Hello,

The test_q_count (in test_balance_count.py) is failing with v1.24.3:

=================================== FAILURES ===================================
_________________________________ test_q_count _________________________________

    def test_q_count():
        """Testing graphics state balances and wrap_contents().

        Take page's contents and generate various imbalanced graphics state
        situations. Each time compare q-count with expected results.
        Finally confirm we are out of balance using "is_wrapped", wrap the
        contents object(s) via "wrap_contents()" and confirm success.
        PDF commands "q" / "Q" stand for "push", respectively "pop".
        """
        doc = pymupdf.open()
        page = doc.new_page()
        # the page has no /Contents objects at all yet. Create one causing
        # an initial imbalance (so prepended "q" is needed)
        pymupdf.TOOLS._insert_contents(page, b"Q", True)  # append
        assert page._count_q_balance() == (1, 0)
        assert page.is_wrapped is False

        # Prepend more data that yield a different type of imbalanced contents:
        # Although counts of q and Q are equal now, the unshielded 'cm' before
        # the first 'q' makes the contents unusable for insertions.
        pymupdf.TOOLS._insert_contents(page, b"1 0 0 -1 0 0 cm q ", False)  # prepend
>       assert page.is_wrapped is False
E       assert True is False
E        +  where True = page 0 of <new PDF, doc# 2038>.is_wrapped

tests/test_balance_count.py:25: AssertionError
=========================== short test summary info ============================
FAILED tests/test_balance_count.py::test_q_count - assert True is False
================= 1 failed, 250 passed, 7 deselected in 13.05s =================

I remain available if needed :)

cc @dvzrv

How to reproduce the bug

build pymupdf v1.24.2 Run tests and see the above one failing

PyMuPDF version

1.24.3

Operating system

Linux

Python version

3.12

julian-smith-artifex-com commented 1 month ago

Thanks for the report. This is a little unexpected - every wheel in the 1.24.3 release passed all tests when it was built.

Could you describe exactly how you are building PyMuPDF?

Antiz96 commented 1 month ago

Thanks for the report. This is a little unexpected - every wheel in the 1.24.3 release passed all tests when it was built.

Could you describe exactly how you are building PyMuPDF?

Thanks for your quick answer. The used build instructions are below:

local cflags=(
    -I/usr/include
    -I/usr/include/freetype2
    -I/usr/include/harbuzz
    -I/usr/include/mupdf
  )
  local ldflags=(
    -lfreetype
    -lgumbo
    -lharfbuzz
    -ljbig2dec
    -ljpeg
    -lleptonica
    -lmupdf
    -lopenjp2
    -ltesseract
  )

  cd PyMuPDF-1.24.3
  # build against system libmupdf
  export PYMUPDF_SETUP_MUPDF_BUILD=''
  # provide tessdata location
  export TESSDATA_PREFIX="/usr/share/tessdata"
  # build against mupdf's C++/ Python language bindings
  export PYMUPDF_SETUP_IMPLEMENTATIONS=b
  CFLAGS+=" ${cflags[@]}"
  LDFLAGS+=" ${ldflags[@]}"

  python -m build --wheel --no-isolation
julian-smith-artifex-com commented 1 month ago

Ah, you're building with a system install of MuPDF. In that case it would be useful to know the MuPDF version and how it was built.

In general we can't guarantee behaviour with non-default MuPDF versions/builds. See scripts/sysinstall.py where it currently excludes these tests:

test_color_count test_3050 test_cli test_cli_out test_pylint test_textbox3

Antiz96 commented 1 month ago

Ah, you're building with a system install of MuPDF. In that case it would be useful to know the MuPDF version and how it was built.

Indeed. This is MuPDF v1.24.1 built with the following (here again, built with some system libraries):

  {
    printf "LINUX_OR_OPENBSD := yes\n"  # required so that Makefile sets soname symlink
    printf "USE_SYSTEM_CURL := yes\n"
    printf "USE_SYSTEM_FREETYPE := yes\n"
    printf "USE_SYSTEM_GLUT := yes\n"
    printf "USE_SYSTEM_GUMBO := yes\n"
    printf "USE_SYSTEM_HARFBUZZ := yes\n"
    printf "USE_SYSTEM_JBIG2DEC := yes\n"
    printf "USE_SYSTEM_JPEGXR := yes\n"  # not used without HAVE_JPEGXR
    printf "USE_SYSTEM_LCMS2 := no\n"  # need lcms2-art fork
    printf "USE_SYSTEM_LEPTONICA := yes\n"
    printf "USE_SYSTEM_LIBJPEG := yes\n"
    printf "USE_SYSTEM_LIBS := yes\n"
    printf "USE_SYSTEM_MUJS := no\n"  # needs patch to debundle
    printf "USE_SYSTEM_OPENJPEG := yes\n"
    printf "USE_SYSTEM_TESSERACT := yes\n"
    printf "USE_SYSTEM_ZLIB := yes\n"
    printf "USE_TESSERACT := yes\n"
  } > user.make

  cd mupdf
  make -j1 VENV_FLAG= shared=yes build=release libs apps c++ python

In general we can't guarantee behaviour with non-default MuPDF versions/builds. See scripts/sysinstall.py where it currently excludes these tests:

test_color_count test_3050 test_cli test_cli_out test_pylint test_textbox3

Fair enough, thanks for the info! Since text_textbox3 is in that list, I guess I can close https://github.com/pymupdf/PyMuPDF/issues/3398 (which uses the same build environment).

julian-smith-artifex-com commented 1 month ago

I have created a PR to update expectations for this test with MuPDF 1.24.0 and 1.24.1. So tests should pass fine for you after it's been merged - hopefully in the next day or so.

Antiz96 commented 1 month ago

Alright, thanks for your quick actions! :slightly_smiling_face:

julian-smith-artifex-com commented 1 month ago

My PR has been merged now.

However it's been pointed out to me that we probably shouldn't simply allow the test to succeed with MuPDF-1.24.1.

The test is for an important feature in PyMuPDF-1.24.3, ensuring worry-free object inserts. Packaging with a MuPDF that is older than the specified version will result in behaviour that differs from what is documented for PyMuPDF-1.24.3, which will cause confusion.

There's probably no easy answer here, but we might want to encourage a later MuPDF to be used if this sort of thing happens again.

Antiz96 commented 1 month ago

My PR has been merged now.

Alright, thanks!

However it's been pointed out to me that we probably shouldn't simply allow the test to succeed with MuPDF-1.24.1.

The test is for an important feature in PyMuPDF-1.24.3, ensuring worry-free object inserts. Packaging with a MuPDF that is older than the specified version will result in behaviour that differs from what is documented for PyMuPDF-1.24.3, which will cause confusion.

I'm confused... At the time I'm writing those lines, MuPDF-1.24.1 seems to be the latest version available. Am I missing something or does that mean that PyMuPDF-1.24.3 expects a version of MuPDF that has not been released yet?

julian-smith-artifex-com commented 1 month ago

Ah, latest PyMuPDF in git, and current release PyMuPDF-1.24.3, hard-codes https://mupdf.com/downloads/archive/mupdf-1.24.2-source.tar.gz.

But it looks like the MuPDF tag 1.24.2 hasn't been pushed yet. Thanks for pointing this out, i'm chatting to the mupdf people so it'll get pushed soon.

Antiz96 commented 1 month ago

Thanks for pushing the MuPDF 1.24.2 tag. I confirm rebuilding PyMUDF 1.24.3 against it solved the issue :)

julian-smith-artifex-com commented 1 month ago

Great, i'm glad it's working for you now.