Open JanChec opened 7 years ago
Yeah, I'd like to have code for that, too. I haven't really looked at how that works yet.
Hi @pmaupin , if you can give me some introduction on how this could to be implemented, i can try to implement it. Thansk
I dont know if there's been any movement on this, but this would be fantastic. I'll see if I can find any relevant information in the spec. FYI this is the one I'm looking at: https://wwwimages2.adobe.com/content/dam/acom/en/devnet/pdf/PDF32000_2008.pdf
As a workaround, I was able have the fields show up by setting an empty string to the appearance dictionary (AP):
form = pdfrw.PdfReader(fname)
annotations = form.pages[0]['/Annots']
for annotation in annotations:
# ... validate / update fields here
annotation.update(pdfrw.PdfDict(AP=''))
The fields are then visible in Preview (Mac OS 10.13.4), but not Acrobat Reader DC. I suspect that Preview detects the invalid appearance dictionary and sets it to a default value.
I have the same problem. Same experience with preview and the appearance dictionary.
Did anyone find a solution for this?
+1
I still have this problem. I'm trying to populate a few annotate fields. Some readers display the new annotate values correctly, however Adobe Reader leaves them blank.
I have 2 documents. Copies of each other. One is a blank Form (A). The other I have filled in the first field with a number and saved it in Acrobat Reader (B). When I open B again the number shows in the field. If I open both documents in the Python interpreter. I can see B.Root.Pages.Kids[0].Annots[0].V has the value. If I copy the value of the first Annotation from B to A and pdfWriter it out. It is only visible when the field has focus. If I copy the whole Annotation from B to A and pdfWriter it out. The value is visible as we all want. I have compared the 2 versions of the Annotation and the only difference I have found is Annot.AP.N.BBox is a bit different but copying this over to A doesn't help. The only thing I haven't carefully compared is Annot.P because it seems to be just circular references to the Page information. The bottom line is. I don't think pdfrw is the problem. There is something else in the PDF which needs to be programatically updated to make this work.
If I then open B (with a value added and saved in Acrobat Reader) in the interpreter, change the value of the field and output the PDF. When I open it in Acrobat Reader the original value is still shown, but when I click on the field the NEW value is shown. I can't find the original value in the Python interpreter but it seems changing the .V attribute is not correct. Something I don't understand is. When I access the value, saved in Acrobat, in the interpreter it prints with round brackets.
>>> field = doc.Root.AcroForm.Fields[0]
>>> field.V
'(777)'
>>> field.update(pdfrw.PdfDict(V=pdfrw.PdfString('444')))
>>> field.V
'444'
When I change the value. Making sure to use the pdfrw.PdfString object. There are no round brackets. If I try to add the round brackets when creating the value they are escaped and included in the field.
Does someone who knows more about pdfrw than me know what these brackets mean?
Characters enclosed in parentheses denotes literal string (type of PDF object).
Thanks Peter. If I do pdfrw.PdfString.encode() then I get the brackets. Unfortunately this still doesn't make the value visible. My best guess at the moment is that Acrobat Reader is moving / copying the value into the PDF text on defocus. This is maybe why I can't find the value, as pdfrw doesn't really give access to the Pdf text. I'm going to try and dump the text from document B with another library and see if I can find a way forward. Unless someone knows to decode the String of bytes (not byte string) that comes out of the content.stream?
@Eddiedigits I'm having the exact same issue. PDF's created and filled with pdfrw cannot be opened correctly in Adobe reader, while other PDF readers view them fine. The fields only appear while putting focus on them. See my other issue #158 . Even if I just read a pdf file and write it directly to a new file, without editing anything, all the annotate keys are added recursively. So I believe there is something wrong with the writing process of pdfrw.
@Eddiedigits As am I. Opening the written file in Acrobat, I can only see the written fields - they are there - when focus is placed on them with mouse. Also, this only works for 2 of the 3 fields written. The 3rd, an email address, is apparently not written at all. Weird! Reader/Form Editor is Acrobat Pro 11.0.3 on macOS.
You need to modify /V and also appearance stream (indirect reference object specified by /AP). /V contains value of the field and /AP specify how to present it.
PDF reference 1.7 page 692
The field’s text is held in a text string (or, beginning with PDF 1.5, a stream) in the V (value) entry of the field dictionary. The contents of this text string or stream are used to construct an appearance stream for displaying the field, as described under “Variable Text” on page 677.
See "Tj" lines in example 8.18, it contains the text that will be displayed as default when you open pdf document (since /AP dictionary contains /N = annotation's normal appearance).
I don't have time right now to investigate if it is possible to easily update appearance stream XObject using pdfrw.
I used example pdf from #132 . Code below will add "im field_1 value" to the first text field. Please note that it's just a proof of concept rather than anything else:
from pdfrw import PdfWriter, PdfReader
INVOICE_TEMPLATE_PATH = 'sample-template.pdf'
INVOICE_OUTPUT_PATH = 'sample-output.pdf'
field1value = 'im field_1 value'
template_pdf = PdfReader(INVOICE_TEMPLATE_PATH)
#update first filed, it's assumed that it's text field
template_pdf.Root.AcroForm.Fields[0].V = field1value
#add apearnance stream to display it
template_pdf.Root.AcroForm.Fields[0].AP.N.stream = '''/Tx BMC
BT
/Helvetica 8.0 Tf
1.0 5.0 Td
0 g
(''' + field1value + ''') Tj
ET EMC'''
PdfWriter().write(INVOICE_OUTPUT_PATH, template_pdf)
See section 5 of PDF reference manual for more text formating/painting options. When I open sample-output.pdf I can see field 1 text in foxit reader, adobe acrobat 11, chrome. Tested on Windows 10.
I'm trying to update the appearance stream with your code. However, I get an error: " AttributeError: 'NoneType' object has no attribute 'N' ". I assume that there is no appearance stream available in my field, so I tried creating it with:
annotation.AP = pdfrw.PdfDict(N=pdfrw.PdfDict(stream='''/Tx BMC
BT
/Helvetica 8.0 Tf
3.0 5.0 Td
0 g
(''' + value + ''') Tj
ET EMC'''))
However this results in disappearing fields in all pdf readers...
You're correct, the error is because no appearnace stream is associated with the field, but you've created it in a wrong way. You've just assigned and stream to AP dictionary. What you need to do is to assign an indirect Xobject to /N in /AP dictionary; and you need to crate Xobject from scratch. The code should be something like the following, but I haven't tested it as I don't have any such pdf file with me right now and no time to create one. You can post an example pdf:
from pdfrw import PdfWriter, PdfReader, IndirectPdfDict, PdfName, PdfDict
INVOICE_TEMPLATE_PATH = 'untitled.pdf'
INVOICE_OUTPUT_PATH = 'untitled-output.pdf'
field1value = 'im field_1 value'
template_pdf = PdfReader(INVOICE_TEMPLATE_PATH)
template_pdf.Root.AcroForm.Fields[0].V = field1value
#this depends on page orientation
rct = template_pdf.Root.AcroForm.Fields[0].Rect
hight = round(float(rct[3]) - float(rct[1]),2)
width =(round(float(rct[2]) - float(rct[0]),2)
#create Xobject
xobj = IndirectPdfDict(
BBox = [0, 0, width, hight],
FormType = 1,
Resources = PdfDict(ProcSet = [PdfName.PDF, PdfName.Text]),
Subtype = PdfName.Form,
Type = PdfName.XObject
)
#assign a stream to it
xobj.stream = '''/Tx BMC
BT
/Helvetica 8.0 Tf
1.0 5.0 Td
0 g
(''' + field1value + ''') Tj
ET EMC'''
#put all together
template_pdf.Root.AcroForm.Fields[0].AP = PdfDict(N = xobj)
#output to new file
PdfWriter().write(INVOICE_OUTPUT_PATH, template_pdf)
FYI: /Type, /FormType, /Resorces are optional (/Resources is strongly recomended). I'm not going to explain the code but if anything unclear just ask or check PDF Reference (all info is there :))
@PeterSlezak This works for me. I just changed the font to /TiRo because that's what is already used in my PDF and changed the stream to 1.0 1.0 Td because the number was appearing too high in the Form Field and cutting off the top half of the number. Thank you very much!!!
@PeterSlezak Hi. I tried your solution. However, Adobe Acrobat Reader is crashing directly after opening the PDF. Also in any other PDF viewer the values aren't displayed anymore. I've tried exactly your code but it seems not to be working unfortunately.
Hi @jancoow, Share your code and pdf file if possible. Otherwise I cannot help you.
All data for testing are in under link: https://bostata.com/post/how_to_populate_fillable_pdfs_with_python/
Which field in PDF array need to by changed to get updated value to appear in new PDF?
Hi @PeterSlezak, thank you for your script, it works great. I have only one problem: I need to write latin-2 characters into the input. I attached a font into the pdf which supports characters like Ő and Ű and I used this font for render but I don't know how to write the value into the input.
I got this error:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 308-311: ordinal not in range(256)
I'm using this:
xobj.stream = '''/Tx BMC
BT
/LiberationSerif 12.0 Tf
1.0 5.0 Td
0 g
(''' + value + ''') Tj
ET EMC'''
with value = "ÍŐŰ"
Hi @PeterSlezak, thank you for your script, it works great. I have only one problem: I need to write latin-2 characters into the input. I attached a font into the pdf which supports characters like Ő and Ű and I used this font for render but I don't know how to write the value into the input.
I got this error:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 308-311: ordinal not in range(256)
I'm using this:
xobj.stream = '''/Tx BMC BT /LiberationSerif 12.0 Tf 1.0 5.0 Td 0 g (''' + value + ''') Tj ET EMC'''
with
value = "ÍŐŰ"
I've got the same problem. I think the pdfrw library only deals with ASCII characters, for the message "ordinal not in range(256)". Probably it can't modify it with unicode, even though it's possible by manual typing. A solution for know may be to use reportlab. If someone has something better using pdfrw would be way more appreciated, I believe.
I see that you're not using a unicode string too. try using the following:
xobj.stream = u'''/Tx BMC
BT
/LiberationSerif 12.0 Tf
1.0 5.0 Td
0 g
({}) Tj
ET EMC'''.format(value)
All data for testing are in under link: https://bostata.com/post/how_to_populate_fillable_pdfs_with_python/
Which field in PDF array need to by changed to get updated value to appear in new PDF?
@ZarakiiKenpachi I don't know why it doesn't work on your pdf. I can populate and display value on few fields but not all.
Hi @Efk3 I never needed non ASCII characters, but my suggestion would be to use \ddd sequence in literal string where ddd is octal character code; or you can try to use hexadecimal string instead of literal string. original xobj.stream code snipped will change to:
xobj.stream = '''/Tx BMC
BT
/Helv 8.0 Tf
1.0 5.0 Td
0 g
<696D206669656C645f312076616C7565> Tj
ET EMC'''
It should display "im field_1 value"
@PeterSlezak Thanks so much for code snippet really helped me!! This is similar to ASCII question above but I have an address field and would like to have a EOL character in the normal address break line spot. I have tried several (ex \n, \r, \015, \012, <br>) and none seem to show in the stream but they will show correctly when focused (template_pdf.Root.AcroForm.Fields[0].V = field1value). Do you have any suggestions?
@PeterSlezak your code seems on spot, although I haven't been able to make it work. I don't have any problem with unicode, could it be a python issue? And just because I haven't seen it mentioned, as a temp workaround, MS Edge does display field values (both unicode and ascii) without a problem.
@PeterSlezak Thanks so much for code snippet really helped me!! This is similar to ASCII question above but I have an address field and would like to have a EOL character in the normal address break line spot. I have tried several (ex \n, \r, \015, \012,
) and none seem to show in the stream but they will show correctly when focused (template_pdf.Root.AcroForm.Fields[0].V = field1value). Do you have any suggestions?
Hi @stbth01 Change the Appearance stream as follows:
template_pdf.Root.AcroForm.Fields[0].AP.N.stream = '''/Tx BMC
BT
/Helvetica 8.0 Tf
0.0 10.0 Td
0 g
(Line one) Tj
0.0 -7.0 Td
(Line two) Tj
ET EMC'''
Just replace "Line one" with the first-line-text and "Line two" with second-line-text, and adjust the Td values as appropriate to fit both lines in your text box. The values depend on the box a font size.
You should also updated/add a form field dictionary entry /Ff 13 to indicate that it's a multi-line field. (When /Ff is completely omitted it indicates a single line text fields.) It should work even without /Ff, but it's better to follow the PDF reference document.
Or you could use:
pdf_template = pdfrw.PdfReader(infile)
pdf_template.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))
pdfrw.PdfWriter().write(outfile, pdf_template)
This adds the NeedAppearances key/value to the AcroForm dict. If I'm understanding your problem correctly.
@TLK3 Worked for me, thanks!
@TLK3 That did the trick for me having the same problem with Adobe not showing the fields.
@TLK3 that totally saved my day. Thank you!
@TLK3 boom! works great
@TLK3 It works for me too, thank you so much !!
@TLK3 it works with Adobe Reader, but not with Preview. To get field values to appear in Preview, use the solution above of setting the appearance dictionary for each modified field to an empty string.
@TLK3 your solution helps very much, thanks! It also works for PyPDF2 in a similar way. However in my case I still have some fields (date field and checkboxes) that remain empty (not rendered). It seems to be a general PDF problem, not pdfrw one.
@TLK3 It works! Thank you!
@TLK3 this saved also my day :-) Thanks alot
@TLK3 Thank you! Any clue why in a big dict of items, some of the filled fields show up, and every tenth or so form some just randomly dont appear?
TLK3's solution works with Acrobat and macOS Preview, but it doesn't work with PDFjs. If I open a file created this way with Acrobat and save it from there, it will then show the field values in PDFjs.
#/bin/python
import os
import pdfrw
def writeFillablePDF(input_pdf_path, output_pdf_path, data_dict):
# Read Input PDF
template_pdf = pdfrw.PdfReader(input_pdf_path)
# Set Apparences ( Make Text field visible )
template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))
# Loop all Annotations
for annotation in template_pdf.pages[0]['/Annots']:
# Only annotations that are Widgets Text
if annotation['/Subtype'] == '/Widget' and annotation['/T']:
key = annotation['/T'][1:-1] # Remove parentheses
if key in data_dict.keys():
annotation.update( pdfrw.PdfDict(V=f'{data_dict[key]}') )
#print(f'={key}={data_dict[key]}=')
pdfrw.PdfWriter().write(output_pdf_path, template_pdf)
if __name__ == '__main__':
TEMPLATE_PATH = 'C:/tmp/OrigDoc.pdf'
OUTPUT_PATH = 'C:/tmp/FilledDoc.pdf'
# Assuming you know the Text Filed Name in the Document
# Build dictionaty with Name & Values
data_dict = {
'CustomerName': 'Big Company Name',
'PartNumber': 'PN12345',
'Revision': '333',
}
writeFillablePDF(TEMPLATE_PATH, OUTPUT_PATH, data_dict)
Below is something that I threw together quick, I was able to iterate through and produce individual PDFs just fine, fields seemed visible (slightly different code).
When I added the merge code in order to produce a multi-page PDF containing results of objects in data
, it seems to no longer work. Can someone take a quick look to see if I'm handling the merge and setting the appearance workaround properly, based on your experience? It's down low in __main__
Many thanks.
import pdfrw
IN_FILE = "awards.csv"
TEMPLATE_FILE = "template.pdf"
ANNOT_KEY = '/Annots'
ANNOT_FIELD_KEY = '/T'
ANNOT_VAL_KEY = '/V'
ANNOT_RECT_KEY = '/Rect'
SUBTYPE_KEY = '/Subtype'
WIDGET_SUBTYPE_KEY = '/Widget'
FIELDS = ["Certificate Category", "Certificate Rank"]
N = 1
# Updates single instance of template pdf, increment form field suffix
def modify_form(input_pdf_path, data_dict):
global N # need to get rid of this
template_pdf = pdfrw.PdfReader(input_pdf_path)
annotations = template_pdf.pages[0][ANNOT_KEY]
for annotation in annotations:
if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
if annotation[ANNOT_FIELD_KEY]:
key = annotation[ANNOT_FIELD_KEY][1:-1]
if key in data_dict.keys():
annotation.update(
pdfrw.PdfDict(T="{}".format(key + str(N)))
)
annotation.update(
pdfrw.PdfDict(V="{}".format(data_dict[key]))
)
annotation.update(pdfrw.PdfDict(Ff=1))
N += 1
return template_pdf
def build_datadict(in_file):
o = []
with open(in_file) as file:
reader = csv.DictReader(file, delimiter=',')
for row in reader:
m = {}
for f in FIELDS:
if row[f] and not row[f].isspace() and not row[f] is None:
m[f] = row[f]
if m:
m['Date'] = "January 25th, 2020"
o.append(m)
return o
if __name__ == '__main__':
data = build_datadict(IN_FILE)
writer = pdfrw.PdfWriter()
writer.trailer.Info = pdfrw.IndirectPdfDict(
Title='Combined PDF'
)
# Iterate array of 'data_dict's
for d in data:
this_pages = modify_form(TEMPLATE_FILE, d) # fill the form
this_pages.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true'))) # maintain appearances
writer.addpages(this_pages.pages) # merge into single pdf
writer.write(IN_FILE.split(".")[0] + ".pdf")
This is the second time I've bounced of pdfrw because of this issue :( The fixes above don't work for me. I've had to go back to pdftk.
Seeing the same issue. PDF form has the values but its not displaying them till I click on these each field in a viewer. The moment I click out it goes away. Viewing the PDF in Mac on both Preview and Acrobat Reader & Pro. So in pro the form field still shows as unfilled (ie it has that blue color indicator of a unfilled form field).
So I guess I need to look at pdftk or some other solution beyond pdfrw?
@pmilano1 Yours is a slightly different issue (see here: https://github.com/pmaupin/pdfrw/issues/171) and it's regarding merging PDFs.
For anyone else reading this and finding that setting the Acroform / NeedAppearances
doesn't work in Acrobat, verify that you're not merging pdf files. It seems the Acroform node is lost during the merging process when the concatenated pdf is written out. There's a Stack Overflow link that has working code that addresses this in the link above.
@TLK3 you are the best. It worked for Acrobat
@davidmacneil Your solution works perfectly for preview in Mac. Thank you!
@tlk3 Thank you buddy
I had a rendering problem with my fields and I've been trying for a lot of hours to solve it. I used your help from here and the holy Stack Overflow but the problem remained. I decided to leave the AP as blank (AP='') when it was not present in the file just to see what happens. I also used Foxit Reader to open the file and everything was perfect. Even printed the pages on paper and it was correct. The same with the browser PDF reader. BUT the Adobe Acrobat did not render the text until I clicked the field and when I previewed the pages for printing, the fields were blank. Does anyone know what doesn't work well with Acrobat? Is something special needed to work properly with Adobe?
I have form fields in my PDF (that make it interactive - you can fill them and print with your data). I want to programatically fill those fields based on their names (template.Root.Pages.Kids[x].Annots[y] - name in 'T', default value in 'V'). The problem is that when I do so it's updated in metadata, but the old value is displayed until I edit the PDF in some desktop editor (I can see new default value and it starts to be displayed when I make any change to this field). I'd love it to be updated as well.
Example: