py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.08k stars 1.39k forks source link

'PdfReadError: File has not been decrypted' for unencrypted file #991

Closed MartinThoma closed 2 years ago

MartinThoma commented 2 years ago

When trying to extract the text from a PDF, I get an exception.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.4.0-113-generic-x86_64-with-glibc2.31

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.2.0

MCVE: Code and PDF

Using this PDF: https://corpora.tika.apache.org/base/docs/govdocs1/976/976028.pdf

from PyPDF2 import PdfReader
from tests import get_pdf_from_url
from io import BytesIO

reader = PdfReader(BytesIO(get_pdf_from_url("https://corpora.tika.apache.org/base/docs/govdocs1/976/976028.pdf", "tika-976028.pdf")))  # PdfReadWarning: incorrect startxref pointer(1)
reader.pages[0].extract_text()

I get:

Traceback (most recent call last):
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 354, in _get_num_pages
    self.decrypt("")
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1617, in decrypt
    return self._decrypt(password)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1657, in _decrypt
    raise NotImplementedError(
NotImplementedError: only algorithm code 1 and 2 are supported. This PDF uses code 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1462, in __getitem__
    len_self = len(self)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1453, in __len__
    return self.length_function()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 357, in _get_num_pages
    raise PdfReadError("File has not been decrypted")
PyPDF2.errors.PdfReadError: File has not been decrypted
MartinThoma commented 2 years ago

Might be related to #416

MartinThoma commented 2 years ago

Might change with #749

MartinThoma commented 2 years ago

This issue no longer occurs :tada: