pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.52k stars 446 forks source link

MuPDF error: syntax error: cannot find ExtGState resource 'BlendMode0' #3526

Closed AndresEV22 closed 1 month ago

AndresEV22 commented 1 month ago

Description of the bug

Hi, first, sorry for my bad English, but I will try to explain my bug or error. Yesterday I updated my pyproject.toml dependencies, and at this moment I am getting an error when the convert_to_pdf() is executed. This error or bug happens when I try to convert a PDF with more than one page; if the PDF has just one page, this doesn't happen.  After searching on the internet, I found something about this error. It says that the error is because the PDFs are corrupt, but I tried to repair the PDF with the "gs -o corrected.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress damage.pdf" Linux command or in another way with ILovePDF, but the error persists.

I applied the migration between 1.22.5 and 1.24.4 with these changes:

How to reproduce the bug

The extact error is:

MuPDF error: syntax error: cannot find ExtGState resource 'BlendMode0'

MuPDF error: syntax error: cannot find ExtGState resource 'BlendMode0'

MuPDF error: syntax error: cannot find ExtGState resource 'BlendMode0'

MuPDF error: syntax error: cannot find ExtGState resource 'BlendMode0'

syntax error: cannot find ExtGState resource 'BlendMode0'
encountered syntax errors; page may not be correct
syntax error: cannot find ExtGState resource 'BlendMode0'
encountered syntax errors; page may not be correct
syntax error: cannot find ExtGState resource 'BlendMode0'
encountered syntax errors; page may not be correct
syntax error: cannot find ExtGState resource 'BlendMode0'
encountered syntax errors; page may not be correct

and this is my function:

def fill_pdf_form(
    pdf_form_stream: bytes,
    widget_values: dict[str, Any],
    image_boxes: list[tuple[pymupdf.Rect, bytes, int]],
) -> bytes:
    with pymupdf.Document(stream=pdf_form_stream) as pdf_document:
        for image_box, image_stream, page_number in image_boxes:
            pdf_page_to_image = cast(
                pymupdf.Page, pdf_document.load_page(page_number - 1)
            )
            pdf_page_to_image.insert_image(image_box, stream=image_stream)

        for pdf_page in pdf_document:
            for widget in pdf_page.widgets():
                widget.field_value = str(widget_values[cast(str, widget.field_name)])
                widget.update()

        return pdf_document.convert_to_pdf()

and my proyect environment is based on python hacth. To clarify this "error," permits work well but are not ideal.

PyMuPDF version

1.24.4

Operating system

Linux

Python version

3.12

JorjMcKie commented 1 month ago

As stated in the documentation: There can no guarantee for PDF-to-PDF conversion to work. Neither is "repairing" a PDF leading to an error-free PDF in all cases.

So this is no bug - just a PDF messed-up beyond a way to repair it. Why do you want to do that conversion anyway?

AndresEV22 commented 1 month ago

Because I have a pdf with form fields, when these fields are fully complete, I need to remove this one, but the text can't disappear.

JorjMcKie commented 1 month ago

Because I have a pdf with form fields, when these fields are fully complete, I need to remove this one, but the text can't disappear.

Aha! Thought it must be this. Please look at Document method bake. It will remove annotations and / or form fields and make their current appearance a permanent part of the page.

AndresEV22 commented 1 month ago

Oh, I see. My bad. Thanks for the help. I am grateful.