pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.49k stars 443 forks source link

MuPDF error: argument error: not a dict (string) #3584

Closed NexPlex closed 2 weeks ago

NexPlex commented 2 weeks ago

Description of the bug

PyMuPDFb==1.23.3. works fine when we upgrade to PyMuPDFb==1.24.5 these lines of code started throwing an error.

doc = fitz.open(stream=input_pdf_path, filetype="pdf")
page = doc[page_number - 1]

MuPDF error: argument error: not a dict (string)

this error repeats about 15 times. but the code save successfully.

I saw a post that suggested we clean the file like this, but it did not change the results.

   doc = fitz.open(stream=input_pdf_path, filetype="pdf")
print('clean')
pdfbytes = doc.tobytes(garbage=3)
doc.close()
doc = fitz.open("pdf", pdfbytes)

print('page') page = doc[page_number - 1]

How to reproduce the bug

No errors with PyMuPDFb==1.23.3.

PyMuPDF version

1.24.5

Operating system

Linux

Python version

3.9

JorjMcKie commented 2 weeks ago

A bug report cannot be accepted without a reproducing file.

NexPlex commented 2 weeks ago

Sorry my bad. See attached. thank you for the quick reply. 003668df-7ea8-4f40-b672-7162f2d1e209-ca-polst-2017-en.pdf

JorjMcKie commented 2 weeks ago

The PDF does contain many errors. It has 39 form fields which all have a wrong entry /AP. AP - albeit optional - if present should be correct. As per the specs it must be a PDF dictionary, but it is a string in all cases. Who knows why. If an AP is missing or unusable, the appearance will be generated from the object definition of the field.

If you are molested by the messages you can request to not show them - use pymupdf.TOOLS.mupdf_display_errors(False).