py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.37k stars 1.41k forks source link

KeyError: '/_States_' with a radio button #2279

Closed alexey-v-paramonov closed 9 months ago

alexey-v-paramonov commented 1 year ago

Replace this: What happened? What were you trying to achieve?

Exception in a document that has radio buttons;

KeyError: '/_States_'

Environment

Which environment were you using when you encountered the problem?

Linux-6.2.0-35-generic-x86_64-with-glibc2.35
pypdf==3.17.0, crypt_provider=('cryptography', '3.4.7'), PIL=9.5.0

Code + PDF

This is a minimal, complete example that shows the issue:

        doc = PdfReader("ACRE.pdf")

        print(doc.get_fields())

Share here the PDF file(s) that cause the issue. The smaller they are, the better. Let us know if we may add them to our tests!

ACRE.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/lx/virtualenv/betao/lib/python3.8/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/home/lx/virtualenv/betao/lib/python3.8/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/lx/virtualenv/betao/lib/python3.8/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/lx/virtualenv/betao/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/home/lx/virtualenv/betao/lib/python3.8/site-packages/django/core/management/commands/shell.py", line 93, in handle
    exec(sys.stdin.read(), globals())
  File "<string>", line 136, in <module>
  File "/home/lx/p/portail/incorporation/pdf_filler.py", line 141, in fill_all
    print(r.get_fields())
  File "/home/lx/virtualenv/betao/lib/python3.8/site-packages/pypdf/_reader.py", line 578, in get_fields
    self._build_field(field, retval, fileobj, field_attributes)
  File "/home/lx/virtualenv/betao/lib/python3.8/site-packages/pypdf/_reader.py", line 665, in _build_field
    and "/Off" in retval[key]["/_States_"]
  File "/home/lx/virtualenv/betao/lib/python3.8/site-packages/pypdf/generic/_data_structures.py", line 334, in __getitem__
    return dict.__getitem__(self, key).get_object()
KeyError: '/_States_'

Proposal

Here: https://github.com/py-pdf/pypdf/blob/main/pypdf/_reader.py#L650C24-L650C24 retval should be initialized, cause it may happen that element (radio btn) does not have any states, like so:

states = []
retval[key][NameObject("/_States_")] = ArrayObject(states)
stefan6419846 commented 1 year ago

Thanks for the report. Do you want to submit a corresponding PR?

alexey-v-paramonov commented 1 year ago

@stefan6419846 Sure, done: https://github.com/py-pdf/pypdf/pull/2280

pubpub-zz commented 1 year ago

This form is odd : the failing field is not referenced in any page... so can't be filled... The french administration has found a technical way to complexify "papers" and not to pay what they should ?😳🤣🤣🤣🤣🤣

alexey-v-paramonov commented 1 year ago

I am using "Master PDF editor" for windows and I can not find a way to remove that field from the PDF. Can anyone give an idea what editor for Linux/Windows may help me to get rid of this invalid form field in the ACRE.pdf?

pubpub-zz commented 1 year ago

Sorry but I dislike the idea of modifying an official form. The change you are proposing to cope with missing fields to build the states sounds better

alexey-v-paramonov commented 1 year ago

@pubpub-zz well I am using a custom modified version of ACRE form in my project, so for me (while the fix is not accepted yet) removing that field completely from the PDF would help a lot.

stefan6419846 commented 9 months ago

Has been fixed in #2280.