from PyPDF2 import PdfReader
from tests import get_pdf_from_url
from io import BytesIO
reader = PdfReader(BytesIO(get_pdf_from_url("https://corpora.tika.apache.org/base/docs/govdocs1/976/976028.pdf", "tika-976028.pdf"))) # PdfReadWarning: incorrect startxref pointer(1)
reader.pages[0].extract_text()
I get:
Traceback (most recent call last):
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 354, in _get_num_pages
self.decrypt("")
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1617, in decrypt
return self._decrypt(password)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1657, in _decrypt
raise NotImplementedError(
NotImplementedError: only algorithm code 1 and 2 are supported. This PDF uses code 4
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1462, in __getitem__
len_self = len(self)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1453, in __len__
return self.length_function()
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 357, in _get_num_pages
raise PdfReadError("File has not been decrypted")
PyPDF2.errors.PdfReadError: File has not been decrypted
When trying to extract the text from a PDF, I get an exception.
Environment
Which environment were you using when you encountered the problem?
MCVE: Code and PDF
Using this PDF: https://corpora.tika.apache.org/base/docs/govdocs1/976/976028.pdf
I get: