Closed christopher5106 closed 1 year ago
Can you open the file + decrypt it with any other PDF reader (e.g. the one from Chrome)?
Yes, I can
@christopher5106 Can you check what is inside data at _encryption.py:L87 when you are getting the issue. I wondering weither the check should not be on len(data)==0
@exiledkingcc Can you give your opinion ?
I don't understand the question. There is a problem in reasoning. If IV=data[:16] and len(IV)==0, then what do you think is the length of data ?
Anyway, probably I don't have enough imagination. So let's code it
print(type(data),len(data), data)
<class 'bytes'> 32 b'\x15\xd8\xf4\x9f<\x01<Q\x83g\x8c\x12j[|\xc0\x04\xfamU\xed\xec\x10\x10\x8cY&\xd6\xf2\x96\x9e\xb0'
<class 'bytes'> 0 b''
can you do some test with this change:
def decrypt(self, data: bytes) -> bytes:
if len(data)==0:
return data
iv = data[:16]
That works fine with this
Do you agree to propose a PR to complete your contribution?
ok do I need to be added to push it to a branch ?
Create a branch on your fork, make the changes, commit them and push the branch onto internet. then when you will go to PR web page you should propose to create a PR. 😉
The PR looks good, well done :+1: It's merged into main
and will be part of the release tomorrow.
This issue will be fixed in pypdf > 3.4.1
@exiledkingcc Can you give your opinion ?
looks good to me
I've run into this same error. I think it's because the fix returns from the decrypt function before the iv variable is initialized. I fixed it by checking the variable and initializing it if necessary. Then let the function complete. Not an ideal fix, but it works. The bigger question is why iv = data[:16] doesn't result in 16 bytes.
def decrypt(self, data: bytes) -> bytes:
iv = data[:16]
if len(iv) != 16:
iv = b"0000000000000000"
data = data[16:]
aes = AES.new(self.key, AES.MODE_CBC, iv)
if len(data) % 16:
data = pad(data, 16)
d = aes.decrypt(data)
if len(d) == 0:
return d
else:
return d[: -d[-1]]
@mrdschrute your proposal does not seem to include the mod from #1663 : have you updated pypdf to latest version ? please ensure that you've move from PyPDF2 to pypdf. can you report weither pypdf 3.6.0 fixes or not your issue?
@mrdschrute your proposal does not seem to include the mod from #1663 : have you updated pypdf to latest version ? please ensure that you've move from PyPDF2 to pypdf. can you report weither pypdf 3.6.0 fixes or not your issue?
Yes, I am using pypdf 3.6.0. I removed the mod from #1663 and replaced it as shown to get it working. As I mentioned, it's probably not the best solution. The decrypt function seems to be called repeatedly while processing a pdf. The data length during those calls is occasionally less then 16, causing the issue.
🤔 by any chance, do you have a document you can share ? you may use @MartinThoma info@martin-thoma.de if you want to keep some privacy
I thought you might ask that. Unfortunately, the document is sensitive (and is not mine), so I can't share it. I also did not create it, so I can't detail that process. I realize that's all bad news. The encrypted version works fine in adobe and the fix I described above does remove the encryption.
@exiledkingcc any understanding why it did not work?
I realized that I may have not done the best job describing this. I think the problem occurs when len(data) is between 0 and 16. In that case, #1663 does not trigger because the length is not zero, but the code later on is expecting a 16 byte IV and gets disappointed.
I realized that I may have not done the best job describing this. I think the problem occurs when len(data) is between 0 and 16. In that case, #1663 does not trigger because the length is not zero, but the code later on is expecting a 16 byte IV and gets disappointed.
@exiledkingcc How would you process data which lenght is between 1 to 15 ?
I realized that I may have not done the best job describing this. I think the problem occurs when len(data) is between 0 and 16. In that case, #1663 does not trigger because the length is not zero, but the code later on is expecting a 16 byte IV and gets disappointed.
@exiledkingcc How would you process data which lenght is between 1 to 15 ?
@exiledkingcc, +1 ?
AES is block cipher, which always processes one block at once, aka 16 bytes. any data should be padded to be multiple of 16 bytes, so it can be encrypted in blocks, and the encrypted result is always multiple of 16 bytes. if any data which is not multiple of 16, it can't be decrypted. usually it means the data is corrupted, or is not AES encrypted.
@mrdschrute can you confirm that you are facing data lengths different between 1 and 15? can you confirm that the objects are properly processed by well known viewers?
@mrdschrute +1?
Sorry for the delayed reply. Yes, I was seeing data lengths between 1 and 15. The objects are properly processed by well known viewers. I actually ending up using PyMuPDF, which worked fine. I'll try find some time to recreate the error with a file I can share.
I close this issue as there is no new input. feel free to ask to reopen it when you will have a test file
I have an error running the following code on PDF document that is encrypted (I had to install PyCryptoDome)
I can't share my PDF file for security reasons.
This is the complete Traceback I see:
For a reason I don't know, the length of iV at line 87 in
venv/lib/python3.8/site-packages/PyPDF2/_encryption.py
is zero. Does that make sense to add the line "if len(iv) == 0: return data" to avoid the error ?Thanks for advance