Updated pdf fields don't show up when page is written

py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

https://pypdf.readthedocs.io/en/latest/

Other

8.22k stars 1.4k forks source link

Updated pdf fields don't show up when page is written #355

Closed segevmalool closed 1 year ago

segevmalool commented 7 years ago

I'd like to use PyPDF2 to fill out a pdf form. So far, everything is going smoothly, including updating the field text. But when I write the pdf to a file, there is apparently no change in the form. Running this code:

import datetime as dt
from PyPDF2 import PdfFileReader, PdfFileWriter
import re

form701 = PdfFileReader('ABC701LG.pdf')
page = form701.getPage(0)
filled = PdfFileWriter()

#removing extraneous fields
r = re.compile('^[0-9]')
fields = sorted(list(filter(r.match, form701.getFields().keys())), key = lambda x: int(x[0:2]))

filled.addPage(page)
filled.updatePageFormFieldValues(filled.getPage(0), 
                                 {fields[0]: 'some filled in text'})

print(filled.getPage(0)['/Annots'][0].getObject()['/T'])
print(filled.getPage(0)['/Annots'][0].getObject()['/V'])

with open('test.pdf','wb') as fp:
    filled.write(fp)

prints text:

1 EFFECTIVE DATE OF THIS SCHEDULE <i.e. the field name> some filled in text

But when I open up test.pdf, there is no added text on the page! Help!

codigovision commented 2 years ago

Here's what worked for me, based on the PR from @fidoriel, but without compressing the stream, which caused the preview not to work for me. This is a simplified version just to show how updating the stream and associated dictionary works. It's working to show the filled in form fields and set them to read only. I've tested it in Apple preview for OSX and IOS, and also in Acrobat Pro DC.

    # Field data.
    data = {"field_name": "some value"}

    # Get template.
    template = PdfReader("template.pdf", strict=False)

    # Initialize writer.
    writer = PdfWriter()

    # Add the template page to the writer.
    writer.add_page(template.pages[0])

    # Get page annotations.
    page_annotations = writer.pages[1][PageAttributes.ANNOTS]

    # Loop through page annotations (fields).
    for index in range(len(page_annotations)):  # type: ignore
        # Get annotation object.
        annotation = page_annotations[index].get_object()  # type: ignore

        # Get existing values needed to create the new stream and update the field.
        field = annotation.get(NameObject("/T"))
        new_value = data[field]
        ap = annotation.get(AnnotationDictionaryAttributes.AP)
        x_object = ap.get(NameObject("/N")).get_object()
        font = annotation.get(InteractiveFormDictEntries.DA)
        rect = annotation.get(AnnotationDictionaryAttributes.Rect)

        # Calculate the text position.
        font_size = float(font.split(" ")[1])
        w = round(float(rect[2] - rect[0] - 2), 2)
        h = round(float(rect[3] - rect[1] - 2), 2)
        text_position_h = h / 2 - font_size / 3  # approximation

        # Create a new XObject stream.
        new_stream = f'''
            /Tx BMC 
            q
            1 1 { w } { h } re W n
            BT
            { font }
            2 { text_position_h } Td
            ({ new_value }) Tj
            ET
            Q
            EMC
        '''

        # Update XObject stream.
        x_object._data = encode_pdfdocencoding(new_stream)

        # Update annotation dictionary.
        annotation.update(
            {
                # Update Value.
                NameObject(FieldDictionaryAttributes.V): TextStringObject(
                    new_value
                ),
                # Update Default Value.
                NameObject(FieldDictionaryAttributes.DV): TextStringObject(
                    new_value
                ),
                # Set Read Only flag.
                NameObject(FieldDictionaryAttributes.Ff): NumberObject(
                    FieldFlag(1)
                )
            }
        )

    # write "output".
    with open("output.pdf", "wb") as output_stream:
        writer.write(output_stream)  # type: ignore

DanielMajer24 commented 1 year ago

Hey all, thanks for all the discussion on this topic. This was very helpful to get my script working.

I got this by creating a user-defined function for updatePageFormFieldValues instead of using the imported one in PyPDF2. This was created based on @ademidun's initial response and later adapted using parts of @ale-rt's answer for PyPDF4 when text isn't showing in some template boxe. I changed the code slightly from to also from PyPDF4 to PyPDF2 for my use case and ensure checkboxes were being selected. Two things to mention about the function:

1) you need to see how the checkboxes have been formatted and whether they use \Yes, \On or \1 when they are selected as they will only engage with their own specific command

2) You need to have the have all the fields of the PDF form present in the field_dictionary when you input values. If you only select the relevant values in the field_dictionary the function will enter the dictionary keys in the text box values if it has not been mentioned. This is most likely an error I have created but with this work simple fix I have been able to get around it... hopefully.

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, TextStringObject

#inital functions

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        time.sleep(1)
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer 

def updatePageFormFieldValues(page, fields):
        '''
        Update the form field values for a given page from a fields dictionary.

       This was copied from the PyPDF2 library and adapted for my use case.

        Copy field texts and values from fields to page.
        :param page: Page reference from PDF writer where the annotations
            and field data will be updated.
        :param fields: a Python dictionary of field names (/T) and text
            values (/V)
        '''
        # Iterate through pages, update field values
        for j in range(0, len(page['/Annots'])):
            writer_annot = page['/Annots'][j].getObject()
            field = writer_annot.get('/T') 
            if writer_annot.get("/FT") == "/Btn":
                value = fields.get(field, random.getrandbits(1))
                if value:
                    writer_annot.update(
                        {
                            NameObject("/AS"): NameObject("/On"),
                            NameObject("/V"): NameObject("/On"),
                        }
                    )
            elif writer_annot.get("/FT") == "/Tx":
                value = fields.get(field,field)
                writer_annot.update(
                    {
                        NameObject("/V"): TextStringObject(value),
                    }
                )

def generate_PDF_file(outfile, field_dictionary, infile):

    pdf = PdfFileReader(open(infile, "rb"), strict=False)
    if "/AcroForm" in pdf.trailer["/Root"]:
        pdf.trailer["/Root"]["/AcroForm"].update(
            {NameObject("/NeedAppearances"): BooleanObject(True)})

    pdf2 = PdfFileWriter()

    pdf2._info = pdf.trailer["/Info"]
    reader_trailer = pdf.trailer["/Root"]
    pdf2._root_object.update(
        {
            key: reader_trailer[key]
            for key in reader_trailer
            if key in ("/AcroForm", "/Lang", "/MarkInfo")
        }
    )

    set_need_appearances_writer(pdf2)
    time.sleep(1)
    if "/AcroForm" in pdf2._root_object:
        pdf2._root_object["/AcroForm"].update(
            {NameObject("/NeedAppearances"): BooleanObject(True)})

    #currently the pdfs I have been working with only have one page but I will have to change this part to accommodate multiple page pdfs
    pdf2.addPage(pdf.getPage(0))
    #####This is where I added the user defined function instead of using the PyPDF2 one.  
    updatePageFormFieldValues(page = pdf2.getPage(0), fields = field_dictionary) #changed to user_defined_function

    outputStream = open(outfile, "wb")
    pdf2.write(outputStream)

    inputStream.close()
    outputStream.close()

Bchapp1558 commented 1 year ago

Not a solution, unfortunately, I am also in need of help. I am pretty new to programming so trying to incorporate some of these solutions has been tricky. Like everyone else, I am trying to make my data visible on my new form without needing to click on the field first. All of the fields are set to be visible.

`from flask import Blueprint, render_template, request from PyPDF2 import PdfReader, PdfWriter from flask_cors import CORS from PyPDF2.generic import BooleanObject, NameObject

views = Blueprint('views', name) CORS(views)

@views.route('/') def home(): return render_template("base.html")

@views.route('/submit', methods=['POST']) def submit(): data = request.json reader = PdfReader("AF594.pdf") writer = PdfWriter() page = reader.pages[0] fields = reader.get_fields() writer.add_page(page) lastNameFirst = data['sponsorLastName'] + ", " + data['sponsorFirstName'] fullPhysicalAddress = data['sponsorAddress'] + ", " + data['city'] + ", " + data['state'] + " " + data['zipCode'] data.update({ "lastNameFirst": lastNameFirst, "fullPhysicalAddress": fullPhysicalAddress})

# Puts the X in block 7 for what child support is based on 
if data['childSupportBasedOnInput'] == "basedOn_DivorceDecree":
    data.update({ "basedOn_DivorceDecree": "X"})
elif data['childSupportBasedOnInput'] == "basedOn_CourtOrder":
    data.update({ "basedOn_CourtOrder": "X"})
elif data['childSupportBasedOnInput'] == "basedOn_LegalSeparationAgreement":
    data.update({ "basedOn_LegalSeparationAgreement": "X"})
elif data['childSupportBasedOnInput'] == "basedOn_WrittenagreementWithChildsCustodian":
    data.update({ "basedOn_WrittenagreementWithChildsCustodian": "X"})

# set the NeedAppearances flag of the AcroForm to True - This was part of how I tried to solve this issue.
try:
    catalog = writer._root_object
    if "/AcroForm" not in catalog:
        writer._root_object.update({
            NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
    need_appearances = NameObject("/NeedAppearances")
    writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
    print('set_need_appearances_writer() catch : ', repr(e))

writer.update_page_form_field_values(
    writer.pages[0], data
)

fileName594 = data['sponsorFirstName'] + "_" + data['sponsorLastName'] +"_filled594.pdf"
with open(fileName594, "wb") as output_stream:
    writer.write(output_stream)
return "success"`

Any help would be appreciated!

cryzed commented 1 year ago

The issue seems to be that PdfWriter.set_need_appearances_writer() (whether called directly or indirectly by PdfWriter.update_page_form_field_values()) fails to create the /Root/AcroForm object correctly when it doesn't already exist in the PdfWriter object.

From what I can tell, a well-formed /Root/AcroForm-object should have keys such as: /DA, /DR, /Fields and /NeedAppearances, however PdfWriter.set_need_appearances_writer() creates an indirect object (reference) pointing to a badly-formed /Root/AcroForm-object with keys stream/data, dict/Filter and dict/NeedAppearances instead.

Long story short, after a bunch of debugging with qpdf --json-output and comparing the pypdf output with tkpdf's output, I came up with this:

import typing as T

import pypdf.generic
import pypdf.constants
import pypdf

def _fix_acroform(writer: pypdf.PdfWriter, reader: pypdf.PdfReader) -> None:
    reader_root = T.cast(pypdf.generic.DictionaryObject, reader.trailer[pypdf.constants.TrailerKeys.ROOT])
    acro_form_key = pypdf.generic.NameObject(pypdf.constants.CatalogDictionary.ACRO_FORM)

    if pypdf.constants.CatalogDictionary.ACRO_FORM in reader_root:
        reader_acro_form = reader_root[pypdf.constants.CatalogDictionary.ACRO_FORM]
        writer._root_object[acro_form_key] = writer._add_object(reader_acro_form.clone(writer))
    else:
        writer._root_object[acro_form_key] = writer._add_object(pypdf.generic.DictionaryObject())

    writer.set_need_appearances_writer()

# ...

_fix_acroform(writer, reader)

# ...

with open("out.pdf", "wb") as file:
    writer.write(file)

My personal use-case is simply filling a PDF with form fields programatically and then saving the output, I was quite disappointed to learn that this was so badly broken after it is advertised as just working in the documentation.

Another few random notes: We call PdfObject.clone() on the existing /Root/AcroForm-object instead of setting it directly: This guarantees that the PdfObject.idnums are translated to the IDs PdfWriter assigned to them, resulting in a well-formed /Root/AcroForm/Fields listing with correct references to the fields. Ironically /Root/AcroForm/Fields seems to be entirely optional, at least for Adobe Acrobat Reader, so we could get away with omitting it -- just if the references inside are wrong it will fail to display the fields correctly.

To summarize: A well-formed /Root/AcroForm object is required with the /Root/AcroForm/NeedAppearances flag set to true (and I suspect only that) -- it doesn't seem to matter whether this object lives directly inside the /Root-object or is just referenced, however pypdf serializes it incorrectly if its the latter. The now-resulting PDF displays correctly in Okular, Adobe Acrobat Reader and various pdf.js browser-implementations for me.

As for the comments that claim the appearance object has to be generated "by hand" for this to work reliably, I think that's silly: not even pdftk does that for anything that's outside of ASCII. That should definitely be the job of the PDF-reader, especially if there are flags like /NeedsAppearance in the PDF-standard, to quote pdftk --help

[need_appearances]
Sets a flag that cues Reader/Acrobat to generate new field
appearances based on the form field values.  Use this when fill-
ing a form with non-ASCII text to ensure the best presentation
in Adobe Reader or Acrobat.  It won't work when combined with
the flatten option.

After checking some more, I think I found the issue, after adjusting PdfWriter.set_need_appearance_writer() to look like this, the issue seems to be resolved:

def set_need_appearances_writer(self) -> None:
    # See 12.7.2 and 7.7.2 for more information:
    # http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = self._root_object
        # get the AcroForm tree
        if CatalogDictionary.ACRO_FORM not in catalog:
            self._root_object[NameObject(CatalogDictionary.ACRO_FORM)] = self._add_object(DictionaryObject())

        need_appearances = NameObject(InteractiveFormDictEntries.NeedAppearances)
        self._root_object[CatalogDictionary.ACRO_FORM][need_appearances] = BooleanObject(True)  # type: ignore
    except Exception as exc:
        logger.error("set_need_appearances_writer() catch : ", repr(exc))

EDIT: Looks like there's another problem: the above code sets the NeedAppearances-flag correctly, and it even works as intended, however Adobe Acrobat Reader seems to only respect the flag for the first page of a document. At this point I am convinced it's a bug in Adobe's software or maybe even done on purpose. That's very unfortunate, because I don't see a way of fixing this in pypdf. An easy, but ugly, workaround is to open the resulting PDF in literally any other PDF reader and use the "print to PDF" function -- the resulting PDF will lose all form fields, but it will appear correctly in Adobe Acrobat Reader.

Taking a closer look at what pdftk does when you use fill_form: it actually does calculate the appearance-stream for every field and fills it in: So that's what I'll be looking at next.

brzGatsu commented 1 year ago

Looking forward to your findings. We are currently stuck with pdftk due to this bug... would love to switch to PyPDF once the issue has been resolved. Thanks for taking a look at it!

cryzed commented 1 year ago

I read through the PDF docs and took a look at pdftk's source code: implementing appearance streams from scratch is possible, but quite tedious (especially if you want to support most common features). I think I'll go with the pdftk-route myself and use it until it becomes unsupported. If that ever happens, I'll take another look at it or hope that PDF is a dead format by that time.

However, I'll reopen my pull request -- the bug that prevents the proper creation of /Root/AcroForm does exist, and is fixed by my PR. With this at least, it's possible to render the first page correctly in Adobe Reader and all pages in most other PDF readers, without all these workarounds.

brzGatsu commented 1 year ago

If I understand correctly, with your PR we could split our pdf into single pages, fill the forms individually and then merge them again? Would that work for Acrobat?

pubpub-zz commented 1 year ago

@brzGatsu I would have add expected 'PdfWriter.append()' to provide some capability to correctly split documents with fields. Can you confirm it ?

cryzed commented 1 year ago

@brzGatsu no, that won't work. The issue is that a PDF reader is supposed to render the appearance streams for all annotations if /Root/AcroForm/NeedAppearances is set, when the document is opened. This rendering only happens at runtime (when Adobe Reader displays the file) and is not persisted, so you can't just split the pages and merge them later.

csears123 commented 1 year ago

I am also experiencing this issue from Adobe Reader, where the NeedAppearances flag is only allowing the first page of the PDF to view the text in the fillable field, as @cryzed documented. On the second page if I click into the field the text becomes visible, only with the cursor focus. Really hoping there is a solution to set the appearance-stream for every field if that is the best and most reliable method. I'll try the example above from @codigovision.

I haven't used pdftk but I will also explore that as an alternative.

csears123 commented 1 year ago

The example below of adding/updating the appearance-stream seemed to work for a 2-page PDF with fillable fields: https://github.com/py-pdf/pypdf/issues/355#issuecomment-1238541441 However the issue persists when merging another fillable PDF form into a single PDF output. The first 2 pages with the original PDF are still working correctly (after updating the appearance-streams), but all the fields on the 3rd page (from a different PDF) do not show the text, it is hidden behind the input until I click into that field (using Adobe Reader). Doing a little more debugging it seems the writer annotation's on the 3rd page do not have a 'AP' attribute to begin with, and the function below returns 'None' type: ap = writer_annot.get(AnnotationDictionaryAttributes.AP) Not sure how to add the missing 'AP' appearance-streams, it's seems complicated. I also ended up testing pdftk and it just worked first try, no workarounds, issues, or bugs that needed addressing. I'll probably be scrapping pypdf for now, unless this critical issue is resolved.

binury commented 1 year ago

Confirming: this is still an issue. Filled annotations do not display as expected in MacOS Preview/{Mobile,} Safari. They do render in Chrome & Acrobat

michael-hoang commented 1 year ago

Has anyone tried doing this for PyPDF2 v3.0.1?

pubpub-zz commented 1 year ago

@michael-hoang PyPDF2 is no more support. you have to upgrade to pypdf latest version.

pubpub-zz commented 1 year ago

@binury if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

binury commented 1 year ago

@binury

if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

FWIW There is a working implementation available currently (in pdftk, of course, as mentioned in preview comment) that does not exhibit the same inconsistent appearance between readers.

I think it's a stretch to claim Pypdf fill is working if the PDFs won't be displayed correctly when viewed on an iPhone… Sure Acrobat is technically the official PDF viewer and the most spec-compliant... But nobody is going to use this if it means writing off a huge majority of users viewing the documents on their mobile phones. Not to mention MacOS users.

In any case… no help needed. just wanted to leave a comment earlier to let you guys know that it's broken still.

In lieu of having access to iOS for testing… There is also a working implementation of creating default appearances in a Node lib called pdf-annot. As a reference that may shed some light on why the default appearances in pypdf aren't working.

alenards commented 1 year ago

@binury - I'm having trouble filling only a library named pdf-annot on npm.

Is it this: https://www.npmjs.com/package/ts-pdf-annot

pubpub-zz commented 1 year ago

@binury - I'm having trouble filling only a library named pdf-annot on npm.

Is it this: https://www.npmjs.com/package/ts-pdf-annot

Your library seems to be JavaScript. I do not think there is a link with pypdf (python)

pubpub-zz commented 1 year ago

@binury

if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

FWIW There is a working implementation available currently (in pdftk, of course, as mentioned in preview comment) that does not exhibit the same inconsistent appearance between readers.

I think it's a stretch to claim Pypdf fill is working if the PDFs won't be displayed correctly when viewed on an iPhone… Sure Acrobat is technically the official PDF viewer and the most spec-compliant... But nobody is going to use this if it means writing off a huge majority of users viewing the documents on their mobile phones. Not to mention MacOS users.

In any case… no help needed. just wanted to leave a comment earlier to let you guys know that it's broken still.

In lieu of having access to iOS for testing… There is also a working implementation of creating default appearances in a Node lib called pdf-annot. As a reference that may shed some light on why the default appearances in pypdf aren't working.

@binury A PR is under submission to improve field rendering if you want to have a try

alenards commented 1 year ago

@pubpub-zz - I think @binury was offering that the referenced library's filled in textfields shown as expected without having to do these workarounds with /Root/AcroForm, adjusting the annotations, or putting in the /NeedsAppearances. If the handling in that library is correct, it might help resolve the need for workarounds on the PdfWriter or these other approaches.

I'm a bit frustrated that the readthedocs for this library make it look like "filling out forms" work here without these workarounds noted. I was incredibly pumped up to see that the latest release for pypdf was on June 4. My use case is filling out forms and having them reliably render in PDF tools (macOS Preview being one of them).

My testing in ipython and macOS Preview led to reported behavior (the value were there, but only when I clicked into the fields, and then they will disappear after focus is lost). I'll try calling PdfWriter.set_need_appearances_writer() directly. But I'm probably going to have to look for another library in another language.

pubpub-zz commented 1 year ago

@pubpub-zz - I think @binury was offering that the referenced library's filled in textfields shown as expected without having to do these workarounds with /Root/AcroForm, adjusting the annotations, or putting in the /NeedsAppearances. If the handling in that library is correct, it might help resolve the need for workarounds on the PdfWriter or these other approaches.

I'm a bit frustrated that the readthedocs for this library make it look like "filling out forms" work here without these workarounds noted. I was incredibly pumped up to see that the latest release for pypdf was on June 4. My use case is filling out forms and having them reliably render in PDF tools (macOS Preview being one of them).

My testing in ipython and macOS Preview led to reported behavior (the value were there, but only when I clicked into the fields, and then they will disappear after focus is lost). I'll try calling PdfWriter.set_need_appearances_writer() directly. But I'm probably going to have to look for another library in another language.

Thanks for the comment : I may have read too quickly the message I understand your position and agree with it. The workaround of needappearance is not the. Best. As said aboveI've produced PR #1864 that is generating the display. It is a first release but If you can test it, it would be great

alenards commented 1 year ago

@pubpub-zz - I definitely appreciate the effort for all the folx keeping pypdf maintained. All of the PyPDF2, PyPDF4, all that is dizzying; so relieved to see this library active.

I'll see if I can look at #1864 - and I'll comment there on that PR thread.

Thanks again.

thomasweiland93 commented 1 year ago

Hello, together I have taken a look on #1864 and tested with a PDF from my company. But unfortunately the appearance doesn't look correct on Iphones etc.

The Problem might be following structure on my pdf:

Parent (writer_parent_annot) {'/DA': '/MyriadPro-Regular 9 Tf 0 0.290 0.439 rg', '/FT': '/Tx', '/Kids': [IndirectObject(88, 0, 2973937844032), IndirectObject(85, 0, 2973937844032)], '/T': '08-Mail2', '/V': '[Nirwana@test.de]'} Child (writer_annot) {'/AP': {'/N': IndirectObject(89, 0, 2973937844032)}, '/F': 4, '/MK': {}, '/P': IndirectObject(49, 0, 2973937844032), '/Parent': IndirectObject(87, 0, 2973937844032), '/Rect': [246.47300000000001, 232.13200000000001, 513.09299999999996, 220.27699999999999], '/Subtype': '/Widget', '/Type': '/Annot'}

In this case the code runs just in the else case of update_page_form_field_values and sets AA.AS to /Off

To get a correct view on the Iphone Viewer i have done some small changes in the _writer.py... (but just a messy fix for my current pdf)

I have used the /DA /FT and /V from the writer_parent_annot an the rest from the writer_annot.

Is this a know Issue?

Chrisd204 commented 9 months ago

@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

with open(outfile, "wb") as fp:
    writer.write(fp)

This solution works!

RomHartmann commented 9 months ago

This thread is crazy long, with a lot of old versions and red herrings.

As of now, this works for me for pypdf==3.17.4

import pypdf
from pypdf import generic as pypdf_generic

# ... load file
reader = pypdf.PdfReader(file)
writer = pypdf.PdfWriter()

writer.set_need_appearances_writer()

for page_nr, page in enumerate(reader.pages):
    form_fields = page.get('/Annots')
    if form_fields:
        for field in form_fields.get_object():
            field_object = field.get_object()

            # any other logic
            field_object.update({
                pypdf_generic.NameObject('/V'): pypdf_generic.create_string_object(field_value)
            })
    writer.add_page(page)

# create your output file or stream
writer.write(output_file)

Conditions of my test:

single page PDF
Only text fields

caver456 commented 8 months ago

Thanks @RomHartmann that definitely got closer.

In the end, as someone else pointed out, flattening is only part of the answer, and relying on NeedAppearences didn't quite do the trick, so modifying the stream directly gave much better results. Here's a stackoverflow question spelling out these specific symptoms (not sure if they are the exact same symptoms as everyone else has been experiencing):

Flattened filled PDF form is 'of invalid format' on Android, and shows blank fields in Chrome extension

and the solution (for our use case, at least) that basically references another solution at https://stackoverflow.com/a/73655665/3577105 - thanks to @JeremyM4n for sure.

WMiller256 commented 6 days ago

@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

with open(outfile, "wb") as fp:
    writer.write(fp)

For future users: this may corrupt the PDF file, if that is the case for you one possible solution is to move the lines

set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

After the page-adding is complete, i.e.

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

with open(outfile, "wb") as fp:
    writer.write(fp)

stefan6419846 commented 5 days ago

While this surely is an old issue, I recommend to switch to the maintained pypdf instead which might already solve this out of the box.