py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.07k stars 1.39k forks source link

pypdf breaks calculated form fields based on the sum of other fields #2092

Open mschlachter-via opened 1 year ago

mschlachter-via commented 1 year ago

When passing pdf files with calculated form fields through pypdf, the calculated fields are no longer updated based on the fields that they depend on.

E.g. If the following file is opened in Acrobat Reader (doesn't seem to work in most other pdf viewers) then the total column at the bottom of the pdf is updated based on the sum of the cost values in the line items of the document: https://static.e-publishing.af.mil/production/1/af_a4/form/af3555/af3555.pdf

However, passing the pdf document through pypdf (even without making any changes) breaks this calculated total field.

Environment

$ python -m platform
Linux-5.15.49-linuxkit-x86_64-with

$ python -c "import pypdf;print(pypdf.__version__)"
3.14.0

Code + PDF

This is a minimal, complete example that shows the issue:

reader = PdfReader("af3555.pdf")
writer = PdfWriter(clone_from=reader)
with open("af3555-edited.pdf", "wb") as output_stream:
    writer.write(output_stream)

https://static.e-publishing.af.mil/production/1/af_a4/form/af3555/af3555.pdf

pubpub-zz commented 1 year ago

When you are cloning the document you are re organizing the internal structure of the document and the signature is broken (an alert is displayed with acrobat reader) The solution is to remove the signature: the easiest way is to delete the full "/Perms" dictionary: del w._root_object["/Perms"] after that you still get an alert saying that you are working on a copy but the calculations are working

I propose to convert it into a discussion to keep the trick visible for other users

pubpub-zz commented 1 year ago

@mschlachter-via can you confirm my proposal?

pubpub-zz commented 1 year ago

@mschlachter-via +1?

mschlachter-via commented 1 year ago

@pubpub-zz I am okay with your proposal to convert to a discussion

I tried your solution and it worked to fix the calculated field but it caused two other problems when the document was opened in Acrobat Reader:

  1. Values filled in with writer.update_page_form_field_values(…) were all reverted to their default values
  2. Acrobat reader refused to save changes, instead showing the following message: Cannot Save Form Information Please Note: You cannot save a completed copy of this form on your computer. If you would like a copy for your records, please fill it in and print it