Closed Luisonson closed 1 year ago
Thank you for your bug report!
Would you mind sharing your PyPDF2 version + the environment you're using? (It's part of the bug ticket template)
I have no problem with the text, but with the checkboxes there is no way.
What does that mean? There is no way to do what?
@Luisonson Have you seen https://pypdf2.readthedocs.io/en/latest/user/forms.html#filling-out-forms ? Does that help? If not, why?
Thank you for your bug report!
Would you mind sharing your PyPDF2 version + the environment you're using? (It's part of the bug ticket template)
I have no problem with the text, but with the checkboxes there is no way.
What does that mean? There is no way to do what?
Hello,
Thanks for your answer. I'm using python 3.8.8 with pypdf2 2.1.0. My IDE is Spyder 5.1.5
I can't select/click the checkboxes or deselect. Also, some checkboxes appears just as /kids of another checkbox, so I can't interact with it as shown in the example with the checkbox BOTON_JORN that has 4 /kids... and those kids are another 4 checkboxes that the only thing I know about them is that are IndirectObject(X, 0).
@Luisonson Have you seen https://pypdf2.readthedocs.io/en/latest/user/forms.html#filling-out-forms ? Does that help? If not, why?
Yes, part of the code I have pasted is from there, but does not work in this PDF with the checkboxes.
Is the problem that it's not shown? So maybe #227 / #355 ?
Another hint: With this pdf (is just page 5 of the previous PDF): filled-out_5.pdf
If I try to update the text boxes, is ok, BUT, if i try to update the checkboxes (unsusesfully), then the text of the boxes is not shown unless I select the box: New code:
Updating two text boxes This examples were written for the pypdf2 2.1.0 version
from PyPDF2 import PdfFileReader, PdfFileWriter
reader = PdfReader("filled-out_5.pdf")
writer = PdfWriter()
page = reader.pages[0]
fields3 = reader.get_fields()
writer.add_page(page)
writer.update_page_form_field_values(
writer.getPage(0), {"Texto41": "Test38",
"Texto56": "Test2"}
)
with open("filled-out_5_out.pdf", "wb") as output_stream:
writer.write(output_stream)
reader.stream.close()
Updating two textboxes and trying to update one checkbox (the bug of the text not showing appears)
from PyPDF2 import PdfFileReader, PdfFileWriter
reader = PdfReader("filled-out_5.pdf")
writer = PdfWriter()
page = reader.pages[0]
fields3 = reader.get_fields()
writer.add_page(page)
writer.update_page_form_field_values(
writer.getPage(0), {"Texto41": "Test38",
"Texto56": "Test2"}
)
writer.update_page_form_field_values(
writer.getPage(0), {"BOTON_TPCON1": "/540"}
)
# write "output" to PyPDF2-output.pdf
with open("filled-out_5_out.pdf", "wb") as output_stream:
writer.write(output_stream)
reader.stream.close()
Also, another error. After the new file is saved, If you try to obtain the fields of the new file with:
reader = PdfReader("filled-out_5_out.pdf")
reader.get_fields()
Does not show any field. I have to open the pdf with Adobe and save it with the adobe, then the code below works
Is the problem that it's not shown? So maybe #227 / #355 ?
No, previusly I was using pypdf2 1.26 and i had the code to mitigate that issue (def set_need_appearances_writer(writer: PdfFileWriter)) on my first message. But with pypdf2 2.1.0 that function is not needed... until you try to modify a checkbox as I just told you in the previous message :(
Oh, so it is a regression? It was working with 1.26 and now it is not working anymore with 2.1.0?
I'll have a closer look today evening after work :-)
Oh, so it is a regression? It was working with 1.26 and now it is not working anymore with 2.1.0?
I'll have a closer look today evening after work :-)
I'm sorry, maybe I'm messing up things. There are several problems . In one hand I have problems with the checkboxes (that problem is with both versions). On the other hand is the problem with the text not showing unless I select the textbox, this second problem only appears in 2.1.0 if I try to change a checkbox, the code that solved that issue in 1.26 seems does not solved it in 2.1.0. Please, use the last code I have pasted and I think you will see it clearer than with my poor explanation.
I'll post a series of comments here to keep track / let people know how I investigate the issue.
# Split, so that we only have one page to care about
$ qpdf --split-pages=1 TEMPORAL.COMPLETO12.de.mayo_unlocked.pdf out.pdf
# Uncompress so that I can view it in an editor
$ qpdf --stream-data=uncompress out-01.pdf uncompressed-1.pdf
That gives uncompressed-1.pdf
Next I used PyPDF2 to find the form fields and their names. I looked for /Btn
and found TEXTOCasilla de verificación25
.
Before filling it:
<< /AP
<< /D
<< /Off 124 0 R /S#ed 125 0 R >> /N
<< /S#ed 126 0 R >> >>
/AS /Off
/DA (/ZaDb 0 Tf 0 0 1 rg) /F 4 /FT /Btn /MK
<< /CA (8) >> /P 3 0 R /Rect [ 51.3755 235.625 63.0763 248.636 ]
/Subtype /Widget /T (TEXTOCasilla de verificación25) /Type /Annot >>
After:
<< /AP
<< /D
<< /Off 171 0 R /S#ed 172 0 R >> /N
<< /S#ed 173 0 R >> >>
/AS /S#ed
/DA (/ZaDb 0 Tf 0 0 1 rg) /F 4 /FT /Btn /MK
<< /CA (8) >> /P 3 0 R /Rect [ 51.3755 235.625 63.0763 248.636 ]
/Subtype /Widget /T (TEXTOCasilla de verificación25) /Type /Annot
/V /S#ed >>
I notice two differences:
/AS /Off
changed to /AS /S#ed
/V /S#ed
was added.@Luisonson This ticks one checkbox:
from PyPDF2 import PdfReader, PdfWriter
from PyPDF2.generic import NameObject
from typing import Dict
def update_checkbox_values(page, fields: Dict[str, bool]):
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].getObject()
field_name = writer_annot.get('/T')
if field_name in fields:
print(f"Found {field_name}")
assert writer_annot.get('/FT') == '/Btn'
print(writer_annot)
if fields[field_name]:
print("\tCheck it")
writer_annot.update({
NameObject("/V"): NameObject("/S#ed"),
NameObject("/AS"): NameObject("/S#ed"),
})
for key in writer_annot:
print((key, writer_annot[key]))
else:
writer_annot.update({
NameObject("/V"): NameObject("/No"),
NameObject("/AS"): NameObject("/Off")
})
reader = PdfReader("TEMPORAL.COMPLETO12.de.mayo_unlocked.pdf")
# See which fields exist
fields = reader.get_form_text_fields()
print(fields)
writer = PdfWriter()
writer.set_need_appearances_writer()
writer.add_page(reader.pages[0])
update_checkbox_values(writer.pages[0], {"TEXTOCasilla de verificación25": False})
with open("filled-out.pdf", "wb") as output_stream:
writer.write(output_stream)
Does this help?
Good Morning, Thanks for your time and efort. We are closer. With page5, for example: https://github.com/py-pdf/PyPDF2/files/8861867/filled-out_5.pdf
reader = PdfReader("filled-out_5.pdf")
# See which fields exist
fields = reader.getFields()
print(fields)
OUTPUT:
{'TEXTOCasilla de verificación555': {'/FT': '/Btn', '/T': 'TEXTOCasilla de verificación555'}, 'BOTON_TPCON1': {'/FT': '/Btn', '/Kids': [IndirectObject(55, 0), IndirectObject(1586, 0)], '/T': 'BOTON_TPCON1', '/Ff': 49152, '/V': '/450401'}, 'Texto56': {'/FT': '/Tx', '/T': 'Texto56'}, 'Texto41': {'/FT': '/Tx', '/T': 'Texto41'}, 'BOTON_INT1': {'/FT': '/Btn', '/Kids': [IndirectObject(1597, 0), IndirectObject(1599, 0), IndirectObject(1604, 0), IndirectObject(1609, 0), IndirectObject(1614, 0), IndirectObject(1619, 0), IndirectObject(1624, 0), IndirectObject(1629, 0), IndirectObject(1634, 0), IndirectObject(1639, 0), IndirectObject(1644, 0), IndirectObject(1649, 0), IndirectObject(1654, 0)], '/T': 'BOTON_INT1', '/Ff': 49152}, 'BOTON_INT1357': {'/FT': '/Btn', '/T': 'BOTON_INT1357', '/Ff': 49152}, 'BOTON_INT166': {'/FT': '/Btn', '/T': 'BOTON_INT166', '/Ff': 49152}}
I need to modify BOTON_TPCON1, from /450401 to /540. But, with your example:
writer.pages[0]['/Annots'][X].getObject().get('/T')
only detects:
Texto56
Texto41
BOTON_INT1357
BOTON_INT166
so....
On the other hand, yesterday someone told me about the fdf file, whitch is an ascii template (easy to modify), whitch you open and merge with the pdf and the pdf will pick up the values of the fdf file. Is pyPDF2 capable of handling fdf files? If not, would be a nice feature to add.
I've seen fdf being mentioned somewhere, but I have no experience with it.
I'm open to PRs, but I also need to check if adding fdf support is in scope for PyPDF2.
For example, in my case, for change some values of the first page is:
%FDF-1.2
%âãÏÓ
1 0 obj
<</FDF<</F(TEMPORAL COMPLETO12 de mayo_unlocked_borrar1.pdf)/Fields[
<</T(BOTON_BON1)/V/Off>>
<</T(BOTON_CLA1)/V/Off>>
<</T(BOTON_CLA13)/V/Off>>
<</T(BOTON_CLA166)/V/Off>>
<</T(BOTON_DISBON)/V/Off>>
<</T(BOTON_DISC1)/V/Off>>
<</T(BOTON_EX44)/V/Off>>
<</T(BOTON_EXCL)/V/Off>>
<</T(BOTON_INS)/V/Off>>
<</T(BOTON_INT1)/V/Off>>
<</T(BOTON_INT1357)/V/Off>>
<</T(BOTON_INT166)/V/Off>>
<</T(BOTON_INVEMP)/V/Off>>
<</T(BOTON_INVEMP2)/V/Off>>
<</T(BOTON_INVEMP266)/V/Off>>
<</T(BOTON_INVEMP266332)/V/Off>>
<</T(BOTON_INVEMP999)/V/Off>>
<</T(BOTON_INVEMP999635)/V/Off>>
<</T(BOTON_INVEMP9997895)/V/Off>>
<</T(BOTON_INVEMP99988)/V/Off>>
<</T(BOTON_INVTIPO)/V/Off>>
<</T(BOTON_INVTIPO11)/V/Off>>
<</T(BOTON_INVTIPO117)/V/Off>>
<</T(BOTON_ISOC8962)/V/Off>>
<</T(BOTON_JORN)/V/S>>
<</T(BOTON_JORNasdf)/V/Off>>
<</T(BOTON_JORNcvbm)/V/D>>
<</T(BOTON_MAY)/V/Off>>
<</T(BOTON_MODAL1)/V/Off>>
<</T(BOTON_OTR)/V/Off>>
<</T(BOTON_REL2)/V/Off>>
<</T(BOTON_TIPOJORNADA)/V/1>>
<</T(BOTON_TPCON1)/V/Off>>
<</T(BOTON_TPCON100)/V/Off>>
<</T(BOTON_TPCON1006)/V/Off>>
<</T(BOTON_TPCON12)/V/Off>>
<</T(BOTON_TPCON196)/V/Off>>
<</T(BOTON_TPCON1969)/V/Off>>
<</T(BOTON_TPCON1985)/V/Off>>
<</T(BOTON_TPCON198745)/V/Off>>
<</T(BOTON_TPCON199)/V/Off>>
<</T(BOTON_VICT)/V/Off>>
<</T(ID_EMPR)/V(16083466A)>>
<</T(TEXTO Casilla de verificación 480)/V/Off>>
<</T(TEXTO Casilla de verificación 481)/V/Off>>
<</T(TEXTO20369)/V/Off>>
<</T(TEXTOCasilla de verificación106666)/V/Off>>
<</T(TEXTOCasilla de verificación12)/V/Off>>
<</T(TEXTOCasilla de verificación13)/V/Off>>
<</T(TEXTOCasilla de verificación25)/V/S#ED>>
<</T(TEXTOCasilla de verificación285)/V/Off>>
<</T(TEXTOCasilla de verificación2853)/V/Off>>
<</T(TEXTOCasilla de verificación28999)/V/Off>>
<</T(TEXTOCasilla de verificación289996)/V/Off>>
<</T(TEXTOCasilla de verificación3221)/V/Off>>
<</T(TEXTOCasilla de verificación32369)/V/Off>>
<</T(TEXTOCasilla de verificación327)/V/Off>>
<</T(TEXTOCasilla de verificación32987)/V/Off>>
<</T(TEXTOCasilla de verificación3299)/V/Off>>
<</T(TEXTOCasilla de verificación369877)/V/Off>>
<</T(TEXTOCasilla de verificación4)/V/Off>>
<</T(TEXTOCasilla de verificación43)/V/Off>>
<</T(TEXTOCasilla de verificación43968)/V/Off>>
<</T(TEXTOCasilla de verificación5)/V/Off>>
<</T(TEXTOCasilla de verificación51)/V/Off>>
<</T(TEXTOCasilla de verificación5189)/V/Off>>
<</T(TEXTOCasilla de verificación518977)/V/Off>>
<</T(TEXTOCasilla de verificación555)/V/Off>>
<</T(TEXTOCasilla de verificación6)/V/Off>>
<</T(TEXTOCasilla de verificación62)/V/Off>>
<</T(TEXTOCasilla de verificación622222)/V/Off>>
<</T(TEXTOCasilla de verificación626)/V/Off>>
<</T(TEXTOCasilla de verificación64)/V/Off>>
<</T(TEXTOCasilla de verificación65)/V/Off>>
<</T(TEXTOCasilla de verificación66)/V/Off>>
<</T(TEXTOCasilla de verificación661)/V/Off>>
<</T(TEXTOCasilla de verificación69)/V/Off>>
<</T(TEXTOCasilla de verificación6911)/V/Off>>
<</T(TEXTOCasilla de verificación7)/V/Off>>
<</T(TEXTOCasilla de verificación72)/V/Off>>
<</T(TEXTOCasilla de verificación7222)/V/Off>>
<</T(TEXTOCasilla de verificación723)/V/Off>>
<</T(TEXTOCasilla de verificación726)/V/Off>>
<</T(TEXTOCasilla de verificación8)/V/Off>>
<</T(TEXTOCasilla de verificación91)/V/Off>>
<</T(TEXTOCasilla de verificación911)/V/Off>>
<</T(TEXTOCasilla de verificación95555)/V/Off>>
<</T(Textocasilla de verificación3)/V/Off>>
<</T(Textocasilla de verificación30)/V/Off>>]
/ID[<25F5DFD17199935FF41213A08FEAFF84><9F88950AEDB5B44BBCEF4494778262B8>]
/UF(TEMPORAL COMPLETO12 de mayo_unlocked_borrar1.pdf)>>/Type/Catalog>>
endobj
trailer
<</Root 1 0 R>>
%%EOF
As you can see, it is quite simple and self-explicatory. BUT, pyPDF2 has to be capable of update any value. To open the fdf file and merge with the pdf I'm using pdftk, that is an old (9 years) exe... but does the job.
As an another example: For the file filled-out_5.pdf that I told you I'm not able to change the checkbox BOTON_TPCON1, The fdf file is (change .txt to .fdf): filled-out_5_datos.txt Quite simple and seems only altered the /V value.
To generate an fdf file, open the pdf file with acrobat -> file -> create -> create form
@MartinThoma I Propose to close this issue, unless you plan some work on FDF file but this is too far away from pdf for me
I (sadly) have to agree: I don't see FDF support happening soon and I don't see us getting process here.
I have added a link to https://github.com/py-pdf/pypdf/discussions/1181 . Feel free to add here or there more details on FDF (PRs introducing support would also be very welcome!).
The fact that I'm closing this is a reflection on the fact that no core contributor will pick this up in the next half year. We want this support in pypdf, but we don't have the resources to make it happen any time soon.
OK, no problem. Thanks for your time.
I'm trying to automate filling this PDF: TEMPORAL COMPLETO12 de mayo_unlocked.pdf
I have no problem with the text, but with the checkboxes there is no way. Many /Btn have /Kids those /kids are other checkboxes that appear as "indirectObject". Also, normal checkboxes I can't select/modify in this pdf (examples bellow)
Code
This example was written for the pypdf2 1.26.0 version
If I modified the pdf manually and read the fields...:
Another checkbox, with NO /kids but I can't select/modify is: 'TEXTOCasilla de verificación25' when selected has the value '/S#ED'
Thanks for your time.
PDF
TEMPORAL COMPLETO12 de mayo_unlocked.pdf