py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.36k stars 1.41k forks source link

KeyError: '/Resources' #1272

Closed DL6ER closed 2 years ago

DL6ER commented 2 years ago

See https://github.com/py-pdf/PyPDF2/issues/1269 for further details.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.4.0-122-generic-x86_64-with-glibc2.29

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.10.3

Code + PDF

This is a minimal, complete example that shows the issue:

from PyPDF2 import PdfReader
with open("TelemetryTX_EM.pdf", "rb") as f:
    reader = PdfReader(f, strict=False)
    full_content = " ".join([page.extract_text() for page in reader.pages])

PDF used above: main.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "test4.py", line 4, in <module>
    content = " ".join([page.extract_text() for page in reader.pages])
  File "test4.py", line 4, in <listcomp>
    content = " ".join([page.extract_text() for page in reader.pages])
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_page.py", line 1510, in extract_text
    return self._extract_text(
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_page.py", line 1143, in _extract_text
    resources_dict = cast(DictionaryObject, obj["/Resources"])
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/generic/_data_structures.py", line 150, in __getitem__
    return dict.__getitem__(self, key).get_object()
KeyError: '/Resources'
DL6ER commented 2 years ago

Related to https://github.com/py-pdf/PyPDF2/issues/270

MasterOdin commented 2 years ago

Attaching the linked PDF into this issue: TelemetryTX_EM.pdf

DL6ER commented 2 years ago

Just for the record, leaving more potential testing files here:

MartinThoma commented 2 years ago

@DL6ER This issue should have been fixed via #1276 in PyPDF2==2.10.4. Could you please check / confirm if that is the case?

If not, I will re-open this issue.

DL6ER commented 2 years ago

Yes, it looks like 2.10.4 works well for these files. Thanks!