openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
163 stars 78 forks source link

Identify digital signatures in PDF and potentially validate these #846

Open ross-spencer opened 1 year ago

ross-spencer commented 1 year ago

Digital signatures contain a checksum that assert something about a previous state of a document. If a checksum from a digital signature no longer validates then a signed document is considered invalid or "tampered with". This is invaluable information as we receive digital contracts and the like into an archival environment. Knowing PDF documents have digital signatures is helpful to then determining workflows that then require handling digital signatures. Going further and validating a signature could then be wrapped into JHOVE's assertion about a document being well-formed and valid. That being said, the latter, "validating" a document based on a digital signature may be more of a meta-validation of the document, i.e. the document is valid in pure specification terms - but invalid because of the verification of additional data within itself. Because of the nature of a digital signature, it may be beyond the scope of JHOVE.

Some notes I wrote on validation a few years back are below. Docusign is used as an example service with demo offerings that help us to access some more modern features around signing.

Notes on signing

A digital signature is a method for verifying the integrity of a PDF document. A document is signed using the signer's public and public key. Once signed a document can be verified using the signer's public key.

Services such as Docusign use a "simple electronic signature".

https://www.skribble.com/en-eu/signaturestandards/https://www.allenovery.com/en-gb/germany/expertise/legal_tech_deutschland

As well as verifying a signature against a certificate (which needs to be valid), a document can be verified against its own content (which also needs to be valid). Modified content calls into question the integrity of a document:

Signature information can be found in the ‘/Type /Sig’ directory entry. The signature should resolve to a checksum. The digest retrieved from the document signature is compared with the digest of the portion of the document before the signature block, i.e. everything up to the signature entry. If the two values match then it signals that the contract that was signed is valid and has not been tampered with since it was signed ; the contract is as it was when it was completed.

A service such as Docusign uses the Adobe PCKCS7 Detached specification for a signature defined as follows:

“adbe.pkcs7.detached defined in ISO 32000-1 section 12.8.3.3 PKCS#7 Signatures; the signature value Contents contain a DER-encoded PKCS#7 binary data object, see above. The original signed message digest over the document’s byte range shall be incorporated as the normal PKCS#7 SignedData field. No data shall be encapsulated in the PKCS#7 SignedData field."

Validation in Adobe Acrobat Reader DC looks as follows:

Invalid

image

Valid

image

What does a signature look like?

Inside the PDF a signature may look as follows:

70 0 obj
<</Type/Sig/Reason(Digitally verifiable PDF exported from www.docusign.com)
   /Location()
   /Prop_Build
      <<
      /App
         <<
            /Name/DocuSign#ae
            >>
            >>
               /ContactInfo()
               /M(D:20210512044426-07'00')
               /Filter
               /Adobe.PPKMS
               /SubFilter
               /adbe.pkcs7.detached/ByteRange [0 399370 464906 5428 ]                                                         
               /Contents <308006092a864886f70d010702a0803080020101310
                  d300b0609608648016503040201308006092a864886f70d0107
                  010000a0820f323082042a30820312a00302010202043863def
                  ... content truncated ...
            >>
            >>
endobj

Examples

Attached are two examples. 2023-04-24-example-signed-contracts.zip

I have also gone through govdocs looking for PDF with this feature, samples and a script to do the same are listed here: https://gist.github.com/ross-spencer/ad51e6b29d8aa63440993aec07f2e307

More information

There is a similar feature request on the PDFCPU issues: https://github.com/pdfcpu/pdfcpu/issues/168#issue-568451734

carlwilson commented 1 year ago

Thanks very much for this @ross-spencer as it looks interesting and valuable. We will add it to the prioritisation list as we decide what's going to be in 1.30 (or 2.0?).