pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.2k stars 498 forks source link

Document.save() failing for particular .PDF input file. #2799

Closed BluBloos closed 11 months ago

BluBloos commented 11 months ago
import fitz

if __name__ == "__main__":

    doc = fitz.open("template.pdf")

    doc_bytes = doc.tobytes()

    with open("template_modified.pdf", 'wb') as file:
        file.write(doc_bytes)

^ The above code fails for the .PDF file attached

by fail, I get the following error in the terminal,

Traceback (most recent call last):
  File "/Users/noahcabral/Dancemakerz-Ticket/convert.py", line 40, in <module>
    doc_bytes = doc.tobytes()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/fitz/fitz.py", line 4730, in write
    self.save(bio, garbage=garbage, clean=clean,
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/fitz/fitz.py", line 4720, in save
    return _fitz.Document_save(self, filename, garbage, clean, deflate, deflate_images, deflate_fonts, incremental, ascii, expand, linear, no_new_id, appearance, pretty, encryption, permissions, owner_pw, user_pw)
RuntimeError: invalid key in dict

Your configuration (mandatory)

Operating System, macOS Monterey version 12.5 Python version, Python 3.10.4 PyMuPDF version, PyMuPDF==1.23.6, PyMuPDFb==1.23.6 installation method,

python -m pip install --upgrade pip
pip install --upgrade pymupdf

template.pdf

EDIT.org.pdf

JorjMcKie commented 11 months ago

This is not a bug: the file is corrupt but repairable: just save it with enough cleaning options. For instance doc.save(output, garbage=3) or doc.tobytes(garbage=3) will work.