m32 / endesive

en-crypt, de-crypt, si-gn, ve-rify - smime, pdf, xades and plain files in pure python
MIT License
242 stars 93 forks source link

pdf.verify throws an UnicodeDecodeError on files modified with PyMuPDF #172

Open ksledz opened 2 months ago

ksledz commented 2 months ago

I experimented with PyMuPDF, endesive and some signed PDFs and noticed that endesive's verify function works on various modified PDFs at all (I first discovered it on PDF's with financial data, and then reproduced it on something generic as seen below) For example, using pdf-acrobat.pdf from endesive repo saved in the same directory as the script:

import pymupdf
doc = pymupdf.open('pdf-acrobat.pdf')
print(doc.get_sigflags())
page = doc[0]
rects = page.search_for("world")
page.add_highlight_annot(rects)
doc.save("output.pdf")

And then trying to verify it:

from endesive import pdf
data = open("output.pdf", "rb").read()
(hashok, signatureok, certok)= pdf.verify(data, None, None)
print("signature ok?", signatureok)
print("hash ok?", hashok)
print("cert ok?", certok)

Leaves a traceback:

UnicodeDecodeError                        Traceback (most recent call last)
Cell In[8], line 3
      1 from endesive import pdf
      2 data = open("output.pdf", "rb").read()
----> 3 (hashok, signatureok, certok)= pdf.verify(data, None, None)
      4 print("signature ok?", signatureok)
      5 print("hash ok?", hashok)

File ~/playground/.venv/lib/python3.12/site-packages/endesive/pdf/verify.py:14, in verify(pdfdata, certs, systemCertsPath)
     12 br = [int(i, 10) for i in pdfdata[start + 1 : stop].split()]
     13 contents = pdfdata[br[0] + br[1] + 1 : br[2] - 1]
---> 14 bcontents = bytes.fromhex(contents.decode("utf8"))
     15 data1 = pdfdata[br[0] : br[0] + br[1]]
     16 data2 = pdfdata[br[2] : br[2] + br[3]]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 0: invalid start byte
m32 commented 2 months ago

Verification of the correctness of PDFs is very naive, in fact it does not exist, e.g. there is no check whether the given range covers the entire document, ..... If no error occurred then everything "should" be ok, but any error should be treated as fatal.

If you have time and desire, please add as many checks as you can - PR is welcome