pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.49k stars 443 forks source link

page.get_pixmap() fails due to `fitz.mupdf.FzErrorLimit: code=5: too many nested graphics states` #3608

Open Luux opened 6 days ago

Luux commented 6 days ago

Description of the bug

Trying to get the pixmap of certain pdf documents fails:

  File "/home/.../test.py", line 11, in <module>
    _ = page.get_pixmap()
        ^^^^^^^^^^^^^^^^^
  File "/home/.../miniconda3/lib/python3.12/site-packages/fitz/utils.py", line 888, in get_pixmap
    dl = page.get_displaylist(annots=annots)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.../miniconda3/lib/python3.12/site-packages/fitz/__init__.py", line 8768, in get_displaylist
    dl = mupdf.fz_new_display_list_from_page(self.this)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.../miniconda3/lib/python3.12/site-packages/fitz/mupdf.py", line 42912, in fz_new_display_list_from_page
    return _mupdf.fz_new_display_list_from_page(page)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
fitz.mupdf.FzErrorLimit: code=5: too many nested graphics states

How to reproduce the bug

with fitz.open(pdf_file) as doc:
    page, *_ = doc.pages()
    _ = page.get_pixmap()

with latest pymupdf version. The same constellation runs just fine with at least pymupdf==1.22.5. Unfortunately, I cannot provide affected files, but it seems to hit some hardcoded recursion limit - maybe a parameter to configure this limit would be enough?

PyMuPDF version

1.24.5

Operating system

Linux

Python version

3.12

JorjMcKie commented 6 days ago

Sorry, but we cannot accept bug reports without any material required to reproduce it! You can use my e-mail address to avoid publishing sensitive information here.

griai commented 6 days ago

The same file worked just fine with earlier versions of PyMuPDF (~although I don't know exactly the version, we are looking for it ...~), e.g. in 1.22.1. Plus, we are very sure that the changed behavior actually lies upstream because mupdf-gl also refuses to render the page, which worked fine earlier. Also muraster produces a several MB large completely white output image. Would it help if we narrowed down the version that introduced the changed behavior? Shall we open an Issue with upstream MuPDF?

JorjMcKie commented 6 days ago

The same file worked just fine with earlier versions of PyMuPDF (~although I don't know exactly the version, we are looking for it ...~), e.g. in 1.22.1. Plus, we are very sure that the changed behavior actually lies upstream because mupdf-gl also refuses to render the page, which worked fine earlier. Also muraster produces a several MB large completely white output image. Would it help if we narrowed down the version that introduced the changed behavior? Shall we open an Issue with upstream MuPDF?

Well in that case you may be better off to directly communicate with the MuPDF team. Their own issue tracking is located here: https://bugs.ghostscript.com/enter_bug.cgi

Luux commented 5 days ago

I created a ticket for mupdf: https://bugs.ghostscript.com/show_bug.cgi?id=707842

JorjMcKie commented 5 days ago

Thank you for the information!