sfneal / PyPDF3

A utility to read and write PDFs with Python
https://pythonhosted.org/PyPDF2/
Other
72 stars 15 forks source link

Handle encyrpted pdf of other than "algorithm 1 or 2" #19

Open OzzieIsaacs opened 2 years ago

OzzieIsaacs commented 2 years ago

I'm using version 1.06 of pyPDF3. Currently pyPDF3 is only capable of handling "algorithm 1 and 2 encrypted pdfs". I would love to have also the other algorithms decodes. For my use case decrypting the document info header would be sufficient.

An encrypted file can be downloaded from here: https://cloud.3dissue.net/24308/24333/24567/65779/Position_4.21-211104-DE-web-20211203082446.pdf

The following code sample demonstrates the problem (having the above mentioned pdf downloaded and renamed to encrypt.pdf):

from PyPDF3 import PdfFileReader
with open('encrypt.pdf', 'rb') as f:
    pdf_file = PdfFileReader(f)
    doc_info = pdf_file.getDocumentInfo()

This code throws the error: PyPDF3.utils.PdfReadError: file has not been decrypted

Adding an additional decrypt statement like this:

from PyPDF3 import PdfFileReader
with open('encrypt.pdf', 'rb') as f:
    pdf_file = PdfFileReader(f)
    if pdf_file.isEncrypted:
        pdf_file.decrypt('')
    doc_info = pdf_file.getDocumentInfo()

leads to: NotImplementedError: only algorithm code 1 and 2 are supported. This PDF uses code 5

For your reference the content of the relevant variables in this case: decrypt_failed


I found "qpdf" which is able to handle the encryption of this files. The decryption algorithm of qpdf can be found in file https://raw.githubusercontent.com/qpdf/qpdf/main/libqpdf/QPDF_encryption.cc

Would be great if somebody could catch up from here and implement the decryption in pypdf3.

MartinThoma commented 2 years ago

@OzzieIsaacs PyPDF2 recently added support for modern decryption ;-)