py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.06k stars 1.39k forks source link

PDF-Form not editable after filling out text field (after upgrade from 3.9.* to 4.3*) #2780

Closed ljbergmann closed 3 days ago

ljbergmann commented 1 month ago

Last year i wrote a small python program that allows me to fill out a PDF-Form and everything worked as i expected it. After running the program i was able to review the created file and even change the contents. As i only use this program once a year i don't follow all the changes to pypdf closely. When i revisited my small program this year i noticed that several updates to pypdf exists, as i dont like to run outdated software i upgraded to the latest version, updated my code according to the documentation and rerun my program.

The good news first the form gets filled out, but when i open the filled out pdf i get a warning that the "extended features" (see attached screenshot) are no longer available. I could live with that, because it is just annoying, but I'm also not able to edit the contents of the PDF anymore which is a problem.

grafik

I've tried with multiple version of pypdf and as far as i know somewhere between 3.9 and 3.11 a change was made that causes this behavior. I've also attached the pdfs created by the different pypdf version:

f5471sm-3.9.1.pdf f5471sm-4.3.1.pdf

Environment

$ python -m platform
Windows-10-10.0.22631-SP0

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none

Code + PDF

For demonstration purposes I've boiled the code down as much as possible.

This is form I'm using: https://www.irs.gov/pub/irs-pdf/f5471sm.pdf

import pypdf
from pypdf import PdfReader
from pypdf import PdfWriter

form = PdfReader("f5471sm.pdf")
fields = form.get_form_text_fields()
writer = PdfWriter()

for key,field in fields.items():
    fields[key] = key

if int(pypdf.__version__[0]) >= 4:
    writer.clone_reader_document_root(form)
    writer.update_page_form_field_values(None, fields)
else:
    for page in form.pages:
        writer.add_page(page)
    for page in writer.pages:
        writer.update_page_form_field_values(page, fields)

with open("f5471sm-"+pypdf.__version__+".pdf","wb") as file:
    writer.write(file)

writer.close()
pubpub-zz commented 1 month ago

Your code in v3.9 is not valid as you are not transfering the Acroform. doing this you are loosing the form /field extraction capabilit

The PDF you are using contains an XFA and seems signed. I need more time to understand how this could be handle to prevent the warning reported

Harry262000 commented 1 month ago

I'm also trying to figure out how to handle forms and signed PDFs to prevent the warning and ensure proper form field extraction.

ljbergmann commented 1 month ago

Your code in v3.9 is not valid as you are not transfering the Acroform. doing this you are loosing the form /field extraction capabilit

The PDF you are using contains an XFA and seems signed. I need more time to understand how this could be handle to prevent the warning reported

Thank you very much for you input @pubpub-zz , maybe the code is not valid / does not use the lib as one should, but - and i just say this to explain why i posted this code - it gave me the results i was trying to get. As i stated the warning is not nessesaryl a big deal but not being able to change the content is a bit of a bummer, because everything else works perfectly fine.

pubpub-zz commented 2 weeks ago

just for archive: f5471sm.pdf

pubpub-zz commented 1 week ago

document can now be written in incremental However, in order to get the data visible however you still need to modify dataset in the XFA form (tracked in https://github.com/py-pdf/pypdf/issues/2824)