Closed segevmalool closed 1 year ago
I am having this same issue. The data does not show up in Adobe Reader unless you activate the field. The data does show up in Bluebeam but if you print, flatten, or push the pdf to a studio session all the data is lost.
When the file is opened in Bluebeam it automatically thinks that the user has made changes, denoted by the asterisk next to the file name in the tab.
If you export the fdf file from Bluebeam all the data is in the fdf file in the proper place.
If you change any attribute of the field in Bluebeam or Adobe, it will recognize the text in that field. It will print correctly and flatten correctly. I am not sure if it will push to the Bluebeam studio but I assume it will. You can also just copy and paste the text in the field back into that field and it will render correctly.
I have not found any help after googling around all day. I think it is an issue with PyPDF2 not "redrawing" the PDF correctly.
I have contacted Bluebeam support and they have returned saying essentially that it is not on their end.
Ok I think I have narrowed this down some by just comparing two different pdfs.
For reference I am trying to read a pdf that was originally created by Bluebeam, use the updatePageFormFields() function in PyPDF2 to push a bunch of data from a database into the form fields, and save. At some point we want to flatten these and that is when it all goes wrong in Bluebeam. In Adobe it is messed up from the start in that you don't see any values in the form fields until you scroll over them with the mouse.
I appears there is a problem with the stream object that follows the object(s) representing the text form field. See below.
This is a sample output from a pdf generated by PyPDF2 for a text form field:
26 0 obj<</Subtype/Widget/M(D:20160512102729-05'00')/NM(OEGVASQHFKGZPSZW)/MK<</IF<</A[0 0]>>>>/F 4/C[1 0 0]/Rect[227.157 346.3074 438.2147 380.0766]/V(Marshall CYG)/Type/Annot/FT/Tx/AP<</N 27 0 R>>/DA(0 0 0 rg /Helv 12 Tf)/T(Owner Group)/BS 29 0 R/Q 0/P 3 0 R>>
endobj
27 0 obj<</Type/XObject/Matrix[1 0 0 1 0 0]/Resources<</ProcSet[/PDF/Text]/Font<</Helv 28 0 R>>>>/Length 41/FormType 1/BBox[0 0 211.0577 33.76923]/Subtype/Form>>
stream
0 0 211.0577 33.76923 re W n /Tx BMC EMC
endstream
endobj
28 0
And if I back up and edit the same based file in Bluebeam the output from that pdf for a text form field looks like this (I think the border object can be ignored):
16 0 obj<</Type/Annot/P 5 0 R/F 4/C[1 0 0]/Subtype/Widget/Q 0/FT/Tx/T(Owner Group)/MK<</IF<</A[0 0]>>>>/DA(0 0 0 rg /Helv 12 Tf)/AP<</N 18 0 R>>/M(D:20170906125217-05'00')/Rect[227.157 346.3074 438.2147 380.0766]/NM(OEGVASQHFKGZPSZW)/BS 17 0 R/V(Marshall CYG)>>
endobj
17 0 obj<</W 1/S/S/Type/Border>>
endobj
18 0 obj<</Type/XObject/Subtype/Form/FormType 1/BBox[0 0 211.0577 33.7692]/Resources<</ProcSet[/PDF/Text]/Font<</Helv 12 0 R>>>>/Matrix[1 0 0 1 0 0]/Length 106>>
stream
0 0 211.0577 33.7692 re W n /Tx BMC BT 0 0 0 rg /Helv 12 Tf 1 0 0 1 2 12.6486 Tm (Marshall CYG) Tj ET EMC
endstream
Ok so the biggest difference here is the stream object at the end. The value /V(Marshall CYG) gets updated in the first object of each pdf, objects 26 and 16 respectively. However the stream object in the PyPDF2 generated pdf does not get updated and the stream object from Bluebeam does get updated.
In testing this theory I made a copy of the PyPDF2 pdf and manually edited the stream object in a text editor. I open this new file in Bluebeam and flattened it. It worked. This also appears to work in adobe reader.
Now how to fix....
A potential solution seems to be setting the Need Appearances flag. Not yet sure how to implement in pypdf2 but these 2 links may provide some clues: https://stackoverflow.com/questions/12198742/pdf-form-text-hidden-unless-clicked https://forums.adobe.com/thread/305250
Okay, I think I have figured it out. If you read section 12.7.2 (page 431) of the PDF 1.7 specification, you will see that you need to set the NeedAppearances flag of the Acroform.
reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}
)
writer = PdfFileWriter()
if "/AcroForm" in writer._root_object:
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}
)
ademidun - Can you elaborate on your suggested solution above? I too am having problems with pdf forms, edited with PyPDF2, not showing field values without clicking in the field. With the code example below, how do you "set the NeedAppearances flag of the Acroform"?
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(open("myInputPdf.pdf", "rb"))
field_dictionary = {'Make': 'Toyota', 'Model': 'Tacoma'}
for pageNum in range(input.numPages):
pageObj = input.getPage(pageNum)
output.addPage(pageObj)
output.updatePageFormFieldValues(pageObj, field_dictionary)
outputStream = open("myOutputPdf.pdf", "wb")
output.write(outputStream)
I tried adding in your IF statements but two problems arise: 1) NameObject and BooleanObject are not defined within my PdfFileReader "input" variable (I do not know how to do that) and 2) "/AcroForm" is not found within the PdfFileWriter object (my "output" variable).
Thanks for any help!
@Tromar44 Preamble, make sure your form is interactive. E.g. The pdf must already have editable fields.
1) Sorry forgot to mention you will have to import them:
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
2) Are you sure you are using output.__root_object["/AcroForm"]
or output.trailer["/Root"]["/AcroForm"]
to access the "/AcroForm" key? and not just doing output["/AcroForm"]
@ademidun I thank you very much for your help but unfortunately I'm still not having any luck. To be clear, my simple test pdf form does have two editable fields and the script will populate them with "Toyota" and "Tacoma" respectively but those values are not visible unless I click on the field in the form (they become invisible again after the field loses focus). Here is the rewritten code that includes your suggestions and the results of running the code in inline comments.
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
infile = "myInputPdf.pdf"
outfile = "myOutputPdf.pdf"
reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]: # result: following "IF code is executed
print(True)
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
writer = PdfFileWriter()
if "/AcroForm" in writer._root_object: # result: False - following "IF" code is NOT executed
print(True)
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
if "/AcroForm" in writer._root_object["/AcroForm"]: # result: "KeyError: '/AcroForm'
print(True)
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
if "/AcroForm" in writer.trailer["/Root"]["/AcroForm"]: # result: AttributeError: 'PdfFileWriter' object has no attribute 'trailer'
print(True)
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)
outputStream = open(outfile, "wb")
writer.write(outputStream)
I would definitely appreciate any more suggestions that you may have! Thank you very much!
It may also be a browser issue. I don't have the links anymore but I remember reading about some issues when opening/creating a PDF on Preview on Mac or viewing it in the browser vs. using an Adobe app etc. Maybe if you google things like "form fields only showing on click" or "form fields only active on click using preview mac".
I also recommend reading the PDF spec link I posted, its a bit dense but a combination of all these should get you in the right direction.
@Tromar44 Okay, I also found this snippet from my code, maybe it will help:
def set_need_appearances_writer(writer: PdfFileWriter):
# See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
try:
catalog = writer._root_object
# get the AcroForm tree
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)
})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
# del writer._root_object["/AcroForm"]['NeedAppearances']
return writer
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
def set_need_appearances_writer(writer: PdfFileWriter):
# See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
try:
catalog = writer._root_object
# get the AcroForm tree
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
return writer
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
infile = "input.pdf"
outfile = "output.pdf"
reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)
with open(outfile, "wb") as fp:
writer.write(fp)
@ademidun you great!!!
Just stumbled upon this solution - great work! A couple of issues I noticed - can you reproduce them? - won't have time to send test case details for a couple of days yet if you need them; we had been using the good-ol fdfgen-then-pdftk-subprocess-call method but would like to get away from the external pdftk dependency so pypdf2 is great:
output.pdf Does not work in the fields in this file, for example, the first field for the phone, the second one for some reason works and a few more fields, so the fix is not working
Hi i am facing the same issue...i have tried setting need lreferences true also.when i edited pdf using pypdf2 some fields are displaying correctly and some are displaying only after i click on that filed.Please help me out on this issue as it is blocking me from the work. Thank you
The code works great! but only for PDFs with one page. I tried splitting my PDF into several one page files and looped through it. This worked great but when I merged them back together, the click-to-reveal-text problem reemerged. The problem lies in the .addPage command for the PdfFileWritter.
for page_number in range(pdf.total_pages):
pdf2.addPage(pdf.getPage(page_number))
pdf2.updatePageFormFieldValues(pdf2.getPage(page_number), field_dictionary)
When I enter this and try to save, I get an error message: "TypeError: argument should be integer or None, not 'NullObject'" It seems that the .addpage does not append the filewriter but treats each page as a seperate object. Does some one have a solution for this?
Problem solved: I figured out the problem was I was running a protected PDF. I manually split the PDF and manually recombind it and now it works great. The solution is often right in front of your nose.
Hi All,
Thanks for your help.
I was able to view the text fields of the PDF Form using pypdf2. But still could not figure out to make the visibility(need appearances) of the checkbox of PDF Form.
Tried with this logic :
catalog = writer._root_object if '/AcroForm' in catalog: writer._root_object["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)})
Thanks in advance.
I found answer for checkboxes issue at https://stackoverflow.com/questions/35538851/how-to-check-uncheck-checkboxes-in-a-pdf-with-python-preferably-pypdf2.
def updateCheckboxValues(page, fields):
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].getObject()
for field in fields:
if writer_annot.get('/T') == field:
writer_annot.update({
NameObject("/V"): NameObject(fields[field]),
NameObject("/AS"): NameObject(fields[field])
})
And as the comment says checked value could be anything depending on how the form was created. It was present in '/AP' for me. Which I extracted using list(writer_annot.get('/AP').get('/N').keys())[0]
.
ok, I have implemented the above and it works on my pdf forms however once the form has been updated by the python it can't be run through the code a second time, as getFormFields returns an empty list. If I open the updated pdf in Adobe and add a space to the end of a form field value and save, run the code on the form again, getFormFields returns the correct list.
I am having the same problem: fields not visible fixed by above-mentioned set_need_appearances_writer() approach but getFormFields/pdftk dump_data_fields does not see them.
In addition, it looks like my fonts somehow get messed up: one of the fields is actually a barcode font. But, after going through PyPDF2 to make a copy with updated fields, the field that uses the barcode font in the original copy now uses one of the other fonts.
I'm experiencing the same click-to-reveal-text issue. Here are a few interesting things I have noticed.
t can't be run through the code a second time, as getFormFields returns an empty list.
For reference, I just stumbled on the same issue. The problem is that the generated pdf does not have an /AcroForm, and the easiest solution is probably to copy it over from the source file like this:
trailer = reader.trailer["/Root"]["/AcroForm"]
writer._root_object.update({
NameObject('/AcroForm'): trailer
})
@mjl can you elaborate how to implement those lines?
anyone figure out a solution to set /NeedAppearance for a pdf with multiple pages?
To include multiple pages to the output PDF, I added the pages from the template onto the outpuf file....
if "/AcroForm" in pdf2._root_object:
pdf2._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
pdf2.addPage(pdf.getPage(0))
pdf2.updatePageFormFieldValues(pdf2.getPage(0), student_data)
**pdf2.addPage(pdf.getPage(1))
pdf2.addPage(pdf.getPage(2))**
outputStream = open(cs_output, "wb")
pdf2.write(outputStream)
outputStream.close()
To include multiple pages to the output PDF, I added the pages from the template onto the outpuf file....
I tried the same thing but Need Appearances seems to apply only to the first page. All the fields on the second page are hidden until focused.
Does anyone have a working fix for this issue for multi-page PDFs?
@mjl can you elaborate how to implement those lines?
You will have a pdf-reader reading in the origin file and a pdf-writer, creating the new pdf (see code of @Tromar44 above). Now you simply need to "copy" over the AcroForm with the lines @mjl presented.
From all those explanations I arrived (as brunnurs stated) to this code. It works for me. Fill textentries and checkboxes for multipage pdf and all changes can be seen using any simple pdf reader.
`from PyPDF2 import PdfFileReader, PdfFileWriter from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, TextStringObject
def set_need_appearances_writer(writer):
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
return writer
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
class PdfFileFiller(object):
def __init__(self, infile):
self.pdf = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in self.pdf.trailer["/Root"]:
self.pdf.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
def update_form_values(self, outfile, newvals=None, newchecks=None):
self.pdf2 = MyPdfFileWriter()
trailer = self.pdf.trailer["/Root"]["/AcroForm"]
self.pdf2._root_object.update({
NameObject('/AcroForm'): trailer})
set_need_appearances_writer(self.pdf2)
if "/AcroForm" in self.pdf2._root_object:
self.pdf2._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
for i in range(self.pdf.getNumPages()):
self.pdf2.addPage(self.pdf.getPage(i))
self.pdf2.updatePageFormFieldValues(self.pdf2.getPage(i), newvals)
self.pdf2.updatePageFormCheckboxValues(self.pdf2.getPage(i), newchecks)
with open(outfile, 'wb') as out:
self.pdf2.write(out)
class MyPdfFileWriter(PdfFileWriter):
def __init__(self):
super().__init__()
def updatePageFormCheckboxValues(self, page, fields):
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].getObject()
for field in fields:
if writer_annot.get('/T') == field:
#print('-------------------------------------')
#print(' FOUND', field)
#print(writer_annot.get('/V'))
writer_annot.update({
NameObject("/V"): NameObject(fields[field]),
NameObject("/AS"): NameObject(fields[field])
})
if name == 'main':
origin = '900in.pdf'
destination = '900out.pdf'
newvals = {"IDETNCON[0]": "A123456T",
"NOMSOL[0]": "ARTICA S.L."}
newchecks={"periodeliq1[0]": "/1"}
c = PdfFileFiller(origin)
c. update_form_values(outfile=destination,
newvals=newvals,
newchecks=newchecks)`
Last code fails for checkboxes using some pdf readers. I modified my MyPdfWriter class:
`def updatePageFormCheckboxValues(self, page, fields):
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].getObject()
for field in fields:
if writer_annot.get('/T') == field:
if fields[field] in ('/1', '/Yes'): # You choose which values use in your code
writer_annot.update({
NameObject("/V"): NameObject(fields[field]),
NameObject("/AS"): NameObject(fields[field])
})`
I am still having issues in showing filled boxed outside of Adobe Acrobat.
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
def set_need_appearances_writer(writer: PdfFileWriter):
# See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
try:
catalog = writer._root_object
# get the AcroForm tree
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
return writer
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
infile = "input.pdf"
outfile = "output.pdf"
pdf = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in pdf.trailer["/Root"]:
pdf.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
pdf2 = PdfFileWriter()
set_need_appearances_writer(pdf2)
if "/AcroForm" in pdf2._root_object:
pdf2._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
field_dictionary = {"iban1_part1": "DE", "Model": "Tacoma"}
pdf2.addPage(pdf.getPage(0))
pdf2.updatePageFormFieldValues(pdf2.getPage(0), field_dictionary)
outputStream = open(outfile, "wb")
pdf2.write(outputStream)
Some boxes are showing properly, some are not - when outside of Acrobat and I need to click on them to show the content.
I also did the same using pdfrw but I got stuck exactly at the same problem.
Hi, giorgio-pap. I'm using the code in a project that I'm developing in order to fill tax forms in Andorra. Because of your comment I have been testing the code and these are my results:
As I'm not a Windows user, I don't use Adobe PDF tools. MasterPDF and qpdfview are my best alternatives working with Linux. Can you test your code with these alternatives?
Hi again, giorgio-pap. Have you check issue #545?
Hi, giorgio-pap. I'm using the code in a project that I'm developing in order to fill tax forms in Andorra. Because of your comment I have been testing the code and these are my results:
* A lot of problems with Adobe Acrobat 9.0 (Last available version for Manjaro Linux) * Good results with MasterPDF (https://code-industry.net/masterpdfeditor/) * Good results with qdpview (https://github.com/bendikro/qpdfview)
As I'm not a Windows user, I don't use Adobe PDF tools. MasterPDF and qpdfview are my best alternatives working with Linux. Can you test your code with these alternatives?
Thanks a lot for your reply! Unfortunately, this script is meant to work for a whole company. So it is necesarry that the ouptut is steady with every most common reading softwares, since I can not require anyone to install anything.
@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:
from PyPDF2 import PdfFileWriter, PdfFileReader from PyPDF2.generic import BooleanObject, NameObject, IndirectObject def set_need_appearances_writer(writer: PdfFileWriter): # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf try: catalog = writer._root_object # get the AcroForm tree if "/AcroForm" not in catalog: writer._root_object.update({ NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)}) need_appearances = NameObject("/NeedAppearances") writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True) return writer except Exception as e: print('set_need_appearances_writer() catch : ', repr(e)) return writer infile = "myInputPdf.pdf" outfile = "myOutputPdf.pdf" pdf = PdfFileReader(open(infile, "rb"), strict=False) if "/AcroForm" in pdf.trailer["/Root"]: pdf.trailer["/Root"]["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)}) pdf2 = PdfFileWriter() set_need_appearances_writer(pdf2) if "/AcroForm" in pdf2._root_object: pdf2._root_object["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)}) field_dictionary = {"Make": "Toyota", "Model": "Tacoma"} pdf2.addPage(pdf.getPage(0)) pdf2.updatePageFormFieldValues(pdf2.getPage(0), field_dictionary) outputStream = open(outfile, "wb") pdf2.write(outputStream)
purrs like a kitten :-)
I am still having issues in showing filled boxed outside of Adobe Acrobat.
from PyPDF2 import PdfFileWriter, PdfFileReader from PyPDF2.generic import BooleanObject, NameObject, IndirectObject def set_need_appearances_writer(writer: PdfFileWriter): # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf try: catalog = writer._root_object # get the AcroForm tree if "/AcroForm" not in catalog: writer._root_object.update({ NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)}) need_appearances = NameObject("/NeedAppearances") writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True) return writer except Exception as e: print('set_need_appearances_writer() catch : ', repr(e)) return writer infile = "input.pdf" outfile = "output.pdf" pdf = PdfFileReader(open(infile, "rb"), strict=False) if "/AcroForm" in pdf.trailer["/Root"]: pdf.trailer["/Root"]["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)}) pdf2 = PdfFileWriter() set_need_appearances_writer(pdf2) if "/AcroForm" in pdf2._root_object: pdf2._root_object["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)}) field_dictionary = {"iban1_part1": "DE", "Model": "Tacoma"} pdf2.addPage(pdf.getPage(0)) pdf2.updatePageFormFieldValues(pdf2.getPage(0), field_dictionary) outputStream = open(outfile, "wb") pdf2.write(outputStream)
Some boxes are showing properly, some are not - when outside of Acrobat and I need to click on them to show the content.
I also did the same using pdfrw but I got stuck exactly at the same problem. I tried this this code but, nothing appears on linux defualt pdfveiwer but all fields are visable on adobe and if you open it on gmail on most platforms. But, not on iphones it only shows some fields I poked around a bit and I think might have something to do with the PDF format but could not solve it with this python tool. I found another that did the job and was viewable from all platforms tested. https://www.blog.pythonlibrary.org/2018/05/22/filling-pdf-forms-with-python/ The code I used is at the very bottom titled "using the pdfforms package". The down side is the code so far hasn't been successfully ran on anything but, Linux and It doesn't click boxes.
Might be a separate issue, but I am having a similar problem with PdfFileMerger(). After merging two PDFs together, one having filled forms, the filled form values do not carry over to the final merged version. However, the values do appear when clicking into one of the forms, weirdly enough. I was wondering if I could apply the above logic, but for PdfFileMerger() instead of PdfFileWriter(), but I'm not sure how to implement that. The append section of my code, simplified:
temp_pdf = r"path.pdf" appendpdf = r"path.pdf" merger = PdfFileMerger() merger.append(PdfFileReader(temp_pdf)) merger.append(PdfFileReader(appendpdf)) merger.write(temp_pdf) merger.close()
The temp_pdf is the one with forms, the appendpdf is typically an image. I'm writing the final merged PDF back to the temp_pdf to overwrite it, that might be a problem, im not sure.
Hello everyone! I tried @hchillon code and it works fine for me. Thanks @hchillon you for sharing it!!
I would like to note that the code does not to the job when the newvals
dict has empty values. For example newvals = {'something':'', 'smth2':'smth'}
would make again the values appear only when the field is clicked. I am posting this for everyone who has a hard time figuring out why it doesn;t work.
From all those explanations I arrived (as brunnurs stated) to this code. It works for me. Fill textentries and checkboxes for multipage pdf and all changes can be seen using any simple pdf reader.
`from PyPDF2 import PdfFileReader, PdfFileWriter from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, TextStringObject
def set_need_appearances_writer(writer):
try: catalog = writer._root_object # get the AcroForm tree and add "/NeedAppearances attribute if "/AcroForm" not in catalog: writer._root_object.update({ NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)}) need_appearances = NameObject("/NeedAppearances") writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True) return writer except Exception as e: print('set_need_appearances_writer() catch : ', repr(e)) return writer
class PdfFileFiller(object):
def __init__(self, infile): self.pdf = PdfFileReader(open(infile, "rb"), strict=False) if "/AcroForm" in self.pdf.trailer["/Root"]: self.pdf.trailer["/Root"]["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)}) def update_form_values(self, outfile, newvals=None, newchecks=None): self.pdf2 = MyPdfFileWriter() trailer = self.pdf.trailer["/Root"]["/AcroForm"] self.pdf2._root_object.update({ NameObject('/AcroForm'): trailer}) set_need_appearances_writer(self.pdf2) if "/AcroForm" in self.pdf2._root_object: self.pdf2._root_object["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)}) for i in range(self.pdf.getNumPages()): self.pdf2.addPage(self.pdf.getPage(i)) self.pdf2.updatePageFormFieldValues(self.pdf2.getPage(i), newvals) self.pdf2.updatePageFormCheckboxValues(self.pdf2.getPage(i), newchecks) with open(outfile, 'wb') as out: self.pdf2.write(out)
class MyPdfFileWriter(PdfFileWriter):
def __init__(self): super().__init__() def updatePageFormCheckboxValues(self, page, fields): for j in range(0, len(page['/Annots'])): writer_annot = page['/Annots'][j].getObject() for field in fields: if writer_annot.get('/T') == field: #print('-------------------------------------') #print(' FOUND', field) #print(writer_annot.get('/V')) writer_annot.update({ NameObject("/V"): NameObject(fields[field]), NameObject("/AS"): NameObject(fields[field]) })
if name == 'main':
origin = '900in.pdf' destination = '900out.pdf' newvals = {"IDETNCON[0]": "A123456T", "NOMSOL[0]": "ARTICA S.L."} newchecks={"periodeliq1[0]": "/1"} c = PdfFileFiller(origin) c. update_form_values(outfile=destination, newvals=newvals, newchecks=newchecks)`
If you suddenly help someone. I had the same issue, solution didn't help for PDF Reader Pro and for standard preview function on Mac OS. Comparing several pdf files, the following helped me:
ap = NameObject('/AP')
for pageNumber in range(writer.getNumPages()):
if '/Annots' not in writer.getPage(pageNumber):
continue
annotationsCount = len(writer.getPage(pageNumber)['/Annots'])
for annotationNumber in range(annotationsCount):
annotation = writer.getPage(pageNumber)['/Annots'][annotationNumber].getObject()
if annotation['/FT'] == '/Tx' and\
'/AP' in annotation and '/N' in annotation['/AP']:
annotation[ap] = annotation['/AP']['/N']
I think the issue is related to the writer not being initialized properly. I resolved the issue copying some data from the reader, see:
#!/usr/bin/env python3
from PyPDF4.generic import NameObject
from PyPDF4.generic import TextStringObject
from PyPDF4.pdf import PdfFileReader
from PyPDF4.pdf import PdfFileWriter
import random
import sys
reader = PdfFileReader(sys.argv[1])
writer = PdfFileWriter()
# Try to "clone" the original one (note the library has cloneDocumentFromReader)
# but the render pdf is blank
writer.appendPagesFromReader(reader)
writer._info = reader.trailer["/Info"]
reader_trailer = reader.trailer["/Root"]
writer._root_object.update(
{
key: reader_trailer[key]
for key in reader_trailer
if key in ("/AcroForm", "/Lang", "/MarkInfo")
}
)
page = writer.getPage(0)
params = {"Foo": "Bar"}
# Inspired by updatePageFormFieldValues but also handle checkboxes
for annot in page["/Annots"]:
writer_annot = annot.getObject()
field = writer_annot["/T"]
if writer_annot["/FT"] == "/Btn":
value = params.get(field, random.getrandbits(1))
if value:
writer_annot.update(
{
NameObject("/AS"): NameObject("/On"),
NameObject("/V"): NameObject("/On"),
}
)
elif writer_annot["/FT"] == "/Tx":
value = params.get(field, field)
writer_annot.update(
{
NameObject("/V"): TextStringObject(value),
}
)
with open(sys.argv[2], "wb") as f:
writer.write(f)
After reading through this thread and trying many of the suggested solutions above, I still was getting strange behavior when previewing the PDF in an application that was not dedicated to viewing / editing PDFs (ex. mobile email client). The PDF would display without showing any of the filled form fields. After piecing together a few solutions mentioned above, I realized that the order is critical in getting the correct behavior. Here is the solution I am using today:
`
def _set_need_appearances_writer(writer: PdfFileWriter):
"""
Enables PDF filled form values to be visible on the final PDF results
NOTE: See 12.7.2 and 7.7.2 for more information:
http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
"""
try:
catalog = writer._root_object
# get the AcroForm tree
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)
})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
# del writer._root_object["/AcroForm"]['NeedAppearances']
return writer
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
def _lock_form_fields(cls, page):
"""
Locks all form fields on the given PyPdf2 Page object
"""
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].getObject()
if writer_annot.get('/T'):
writer_annot.update({
NameObject("/Ff"): NumberObject(1)
})
def _init_pdf_writer_from_reader(cls, reader: PdfFileReader) -> PdfFileWriter:
"""
Initializes a PdfFileWriter that can be used to write data to the given PDF
stored inside of the PdfFileReader.
IMPORTANT: Using this init function ensures that the data written is visible
both in a PDF Viewer Application and in a Preview context (i.e. an email client)
"""
if not reader or reader.getNumPages() == 0:
raise Exception(f"Error initializing PdfFileWriter, given PdfFileReader "
f"is either null or contains no pages.")
pdf_writer = PdfFileWriter()
# Add all PDF pages from reader -> writer
pdf_writer.appendPagesFromReader(reader)
# Copy over additional data from reader -> writer
pdf_writer._info = reader.trailer["/Info"]
reader_trailer = reader.trailer["/Root"]
pdf_writer._root_object.update(
{
key: reader_trailer[key]
for key in reader_trailer
if key in ("/AcroForm", "/Lang", "/MarkInfo")
}
)
# Set written data appearances to be visible
cls._set_need_appearances_writer(pdf_writer)
if "/AcroForm" in pdf_writer._root_object:
pdf_writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
return pdf_writer
`
By initializing the PDF Writer correctly we ensure that the data written to the PDFs form fields will be visible without having to click the field in a PDF viewer application. We also guarantee it will be visible in a non-pdf viewing specific application which is important if your client / end-user is using an app that you cannot be sure of what it will be to view the PDF. Lastly I included a method to lock the fields on a given PDF page, that way it is no longer editable by your end-user (if this is desired behavior).
Thanks to @ale-rt and many others above.
@Dpats13 is your code part of a broader object definition? I'm thrown by the cls args.
@apteryxlabs ya, you can ignore those.
@Dpats13 Id like to implement your solution but my python/programming skills are not great. Do you mind posting some working code assuming variables similar to below?
infile = 'myInputPdf.pdf' outfile = 'myOutputPdf.pdf' field_dictionary = {'foo':'bar'}
One of the old solutions above offered by @ademidun via @Tromar44 above still works well for me for filling pdf forms and reading them but trying to go back and programmatically (ex. PyPDF2, pdfminer) read the content of those filled forms returns empty fields (ie. I can manually open the PDF and see the content of those fields without clicking them but reading them via python returns empty fields). If I manually open the PDF and save it before closing it, then I am able to programmatically read the fields.
Any demo/example of your solution would be greatly appreciated - thanks!
If anyone is having issues writing to RadioGroup fields, here is my code that successfully updates TextFields, ListBoxes, RadioGroups, and Checkboxes.
def fill_pdf_form(infile, outfile, field_dictionary):
inputStream = open(infile, "rb")
pr = PdfFileReader(inputStream, strict=False)
if "/AcroForm" in pr.trailer["/Root"]:
pr.trailer["/Root"]["/AcroForm"].update({NameObject("/NeedAppearances"): BooleanObject(True)})
pw = PdfFileWriter()
set_need_appearances_writer(pw)
if "/AcroForm" in pw._root_object:
pw._root_object["/AcroForm"].update({NameObject("/NeedAppearances"): BooleanObject(True)})
for pageNum in range(pr.numPages):
pw.addPage(pr.getPage(pageNum))
pw.updatePageFormFieldValues(pw.getPage(pageNum), field_dictionary)
if "/AcroForm" in pr.trailer["/Root"]:
pw._root_object.update({NameObject('/AcroForm'): pr.trailer["/Root"]["/AcroForm"]})
## this next part manually updates RadioGroup items, which aren't updated by PyPDF2's updatePageFormFieldValues()
for pageNum in range(pw.getNumPages()):
page = pw.getPage(pageNum)
annots = page['/Annots']
for j in range(0, len(annots)):
writer_annot = page['/Annots'][j].getObject()
if writer_annot.get('/T') == None:
parent_ido = writer_annot.get('/Parent')
if parent_ido:
parent_obj = parent_ido.getObject()
radiogroup_name = parent_obj.get('/T')
if radiogroup_name:
for field in field_dictionary:
if field == radiogroup_name:
parent_obj.update({NameObject("/V"): NameObject('/{}'.format(field_dictionary[field])), })
outputStream = open(outfile, "wb")
pw.write(outputStream)
inputStream.close()
outputStream.close()
@lymanjohnson This works for PDFs with multiple pages? I had something similar that was still failing on multiple pages.
I still see this issue in the Atril document viewer for test_fill_form
.
Aparently some people had luck with
# Set /NeedAppearances
writer.set_need_appearances_writer()
# Make it read-only with /Ff:
writer.updatePageFormFieldValues(writer.getPage(0), {"foo": "some filled in text"}, flags=1)
However, at least with Evince this doesn't work. And the Google Chrome PDF viewer always shows the filled fields.
I see quite a few comments that /NeedAppearances makes it work, but I'm sorry, that is not a general solution. This hints to the reader app that it needs to do some work to render correct form fields, but there are a lot of reader apps out there that do not honor that flag or do it badly.
What one needs to do is to really go over all the form fields and render them (ie. text input field -> add an Appearance Stream /AP that renders the entered value, checkboxes -> add Appearance State /AS that shows the field checked, other field types probably need even more work, this I have not investigated because I did not need those thus far).
What I ended up doing is inspect Acrobat generated forms and emulating that. I think I used qpdf --show-object
to dissect the pdfs.
This comment helped me lots to get me started: https://github.com/pmaupin/pdfrw/issues/84#issuecomment-445303928
Summarizing some ideas:
Fields
dictionary/NeedAppearances
seems not to help as it was added via writer.set_need_appearances_writer()
Yesterday I took a deep dive into the PDF standard. I am 99% confident that this issue originates, like @mjl said, in a missing apperance stream. Today I was able to append a apperance Stream to the form field. The content of the filled field is now visible in Acrobat, SumatraPDF, Okular, Chrome, Firefox, Edge and it does also print. Now I am experiencing issues the layout of the text and special characters. I hope that I can solve them soon to be able to submit a PR.
I'd like to use PyPDF2 to fill out a pdf form. So far, everything is going smoothly, including updating the field text. But when I write the pdf to a file, there is apparently no change in the form. Running this code:
prints text:
But when I open up test.pdf, there is no added text on the page! Help!