openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

Unexpected exception java.lang.NullPointerException returned in validation result for Docusign summary PDFs #668

Open ross-spencer opened 3 years ago

ross-spencer commented 3 years ago

Attached are two files exhibiting the same problem. They were created using the docusign demo: https://secure.docusign.com/demo and were created using its two export methods.

For all intents and purposes it looks as if the validation completes, and the validation result is "Not well formed". This seems to be because of the error message raised: ErrorMessage: Unexpected exception java.lang.NullPointerException.

The results are below:

JhoveView (Rel. 1.25.0-SNAPSHOT, 2021-04-07)
 Date: 2021-04-08 09:51:15 CEST
 RepresentationInformation: /tmp/opf/combined_Please_review__sign_your_document.pdf
  ReportingModule: PDF-hul, Rel. 1.12.2 (2019-12-10)
  LastModified: 2021-04-08 09:05:38 CEST
  Size: 319340
  Format: PDF
  Version: 1.4
  Status: Not well-formed
  SignatureMatches:
   PDF-hul
  ErrorMessage: Unexpected exception java.lang.NullPointerException
   ID: PDF-HUL-94
  MIMEtype: application/pdf
  PDFMetadata: 
   Objects: 71
   FreeObjects: 3
   IncrementalUpdates: 2
   DocumentCatalog: 
    PageLayout: SinglePage
    PageMode: UseNone
   Info: 
    Title: 
    Author: 
    Subject: 
   ID: 0x35633136303966662d373265312d343133382d623963622d336132373233313264346263, 0xb71234e8ef9a42d94605e600888c58f8
   Filters: 
    FilterPipeline: FlateDecode
   Fonts: 
    Type0: 
     Font: 
      BaseFont: FDXESL+HelveticaNeue
      Encoding: Identity-H
      ToUnicode: true
    TrueType: 
     Font: 
      BaseFont: FDXESL+HelveticaNeue-Bold
      FontSubset: true
      FirstChar: 32
      LastChar: 87
      FontDescriptor: 
       FontName: FDXESL+HelveticaNeue-Bold
       Flags: 32
       FontBBox: -1018, -481, 1437, 1141
       FontFile2: true
      Encoding: WinAnsiEncoding
      ToUnicode: true
     Font: 
      BaseFont: FDXESL+HelveticaNeue
      FontSubset: true
      FirstChar: 32
      LastChar: 146
      FontDescriptor: 
       FontName: FDXESL+HelveticaNeue
       Flags: 32
       FontBBox: -951, -481, 1446, 1077
       FontFile2: true
      Encoding: WinAnsiEncoding
      ToUnicode: true
    CIDFontType2: 
     Font: 
      BaseFont: FDXESL+HelveticaNeue
      CIDSystemInfo: 
       Registry: Adobe
       Registry: Identity
       Supplement: 0
      FontDescriptor: 
       FontName: FDXESL+HelveticaNeue
       Flags: 4
       FontBBox: -951, -481, 1446, 1077
       FontFile2: true
   XMP: <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
        xmlns:xmp="http://ns.adobe.com/xap/1.0/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
      pdf:Producer="PDFKit.NET 21.1.102.20091"
      pdf:Keywords=""
      pdf:PDFVersion="1.4"
      xmp:CreateDate="2021-04-08T00:05:32-07:00"
      xmp:ModifyDate="2021-04-08T00:05:32-07:00"
      xmp:CreatorTool=""
      xmp:MetadataDate="2021-04-08T00:05:32-07:00"
      dc:format="application/pdf">
      <dc:creator>
        <rdf:Seq>
          <rdf:li/>
        </rdf:Seq>
      </dc:creator>
      <dc:subject>
        <rdf:Bag/>
      </dc:subject>
      <dc:description>
        <rdf:Alt>
          <rdf:li xml:lang="x-default"/>
        </rdf:Alt>
      </dc:description>
      <dc:title>
        <rdf:Alt>
          <rdf:li xml:lang="x-default"/>
        </rdf:Alt>
      </dc:title>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
   Pages: 
    Page: 
     Sequence: 1
     Annotations: 
      Annotation: 
       Subtype: Widget
       Rect: 0, 0, 0, 0
       Flags: 132
       AppearanceDictionary: true
 RepresentationInformation: /tmp/opf/Summary.pdf
  ReportingModule: PDF-hul, Rel. 1.12.2 (2019-12-10)
  LastModified: 2021-04-08 00:05:16 CEST
  Size: 132877
  Format: PDF
  Version: 1.4
  Status: Not well-formed
  SignatureMatches:
   PDF-hul
  ErrorMessage: Unexpected exception java.lang.NullPointerException
   ID: PDF-HUL-94
  MIMEtype: application/pdf
  PDFMetadata: 
   Objects: 38
   FreeObjects: 3
   IncrementalUpdates: 2
   DocumentCatalog: 
    PageLayout: SinglePage
    PageMode: UseNone
   Info: 
    Title: DocuSign-Zertifikat
    Author: 
    Subject: DocuSign-Zertifikat
   ID: 0x35663464323637302d633562612d346261622d623964322d636163386437306164313131, 0x9d91f58b1574d3a15937b1cd2b36d684
   XMP: <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
        xmlns:xmp="http://ns.adobe.com/xap/1.0/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
      pdf:Producer="PDFKit.NET 21.1.102.20091"
      pdf:Keywords=""
      pdf:PDFVersion="1.4"
      xmp:CreateDate="2021-04-08T00:05:16-07:00"
      xmp:ModifyDate="2021-04-08T00:05:16-07:00"
      xmp:CreatorTool=""
      xmp:MetadataDate="2021-04-08T00:05:16-07:00"
      dc:format="application/pdf">
      <dc:creator>
        <rdf:Seq>
          <rdf:li/>
        </rdf:Seq>
      </dc:creator>
      <dc:subject>
        <rdf:Bag/>
      </dc:subject>
      <dc:description>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">DocuSign-Zertifikat</rdf:li>
        </rdf:Alt>
      </dc:description>
      <dc:title>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">DocuSign-Zertifikat</rdf:li>
        </rdf:Alt>
      </dc:title>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
   Pages: 
    Page: 
     Sequence: 1
     Annotations: 
      Annotation: 
       Subtype: Widget
       Rect: 0, 0, 0, 0
       Flags: 132
       AppearanceDictionary: true

With logging turned on I am not seeing any other confirmation of the error, or what's causing it, i.e. no stack trace. The available log lines for both files are:

Apr 08, 2021 10:27:50 AM edu.harvard.hul.ois.jhove.JhoveBase dispatch
INFO: Handler edu.harvard.hul.ois.jhove.viewer.ViewHandler preparing to write to null
Apr 08, 2021 10:27:50 AM edu.harvard.hul.ois.jhove.JhoveBase process
INFO: Entering JhoveBase.process, file/uri = /tmp/opf/Summary.pdf
Apr 08, 2021 10:27:50 AM edu.harvard.hul.ois.jhove.JhoveBase process
INFO: Processing Summary.pdf with module edu.harvard.hul.ois.jhove.module.PdfModule
Apr 08, 2021 10:27:50 AM edu.harvard.hul.ois.jhove.ModuleBase initParse
INFO: PDF-hul called initParse
Apr 08, 2021 10:27:50 AM edu.harvard.hul.ois.jhove.module.PdfModule findImages
INFO: Getting image
Apr 08, 2021 10:28:58 AM edu.harvard.hul.ois.jhove.JhoveBase dispatch
INFO: Handler edu.harvard.hul.ois.jhove.viewer.ViewHandler preparing to write to null
Apr 08, 2021 10:28:58 AM edu.harvard.hul.ois.jhove.JhoveBase process
INFO: Entering JhoveBase.process, file/uri = /tmp/opf/combined_Please_review__sign_your_document.pdf
Apr 08, 2021 10:28:58 AM edu.harvard.hul.ois.jhove.JhoveBase process
INFO: Processing combined_Please_review__sign_your_document.pdf with module edu.harvard.hul.ois.jhove.module.PdfModule
Apr 08, 2021 10:28:58 AM edu.harvard.hul.ois.jhove.ModuleBase initParse
INFO: PDF-hul called initParse
Apr 08, 2021 10:28:58 AM edu.harvard.hul.ois.jhove.module.PdfModule findImages
INFO: Getting image
Apr 08, 2021 10:28:58 AM edu.harvard.hul.ois.jhove.module.PdfModule findImages
INFO: Getting image

While similar to https://github.com/openpreserve/jhove/issues/256 - the files in #256 fail with JHOVE reporting that "Validation ended prematurely due to an unhandled exception." here the validation completes but contains the null pointer exception.

It might be necessary for others to confirm this issue for these two files.

Currently using:

OpenJDK version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ububtu1-20.-4-b08)
OpenJDK 64-bit Server VM (build 25.282-b08, mixed mode)

docusign-summary-examples.zip

ross-spencer commented 3 years ago

Another example I've been able to recreate the problem with.

new-example-not-well-formed.zip

andreakb commented 2 years ago

I encountered the same error with a PDF that was signed with docusign via (I'm pretty sure) UC of Santa Cruz's docusign portal (https://its.ucsc.edu/docusign/index.html). Unfortunately, I cannot share the PDF as there is identifying information on the document, but I am copying the JHOVE output below:

Jhove (Rel. 1.24.1, 2020-03-16) Date: 2022-03-24 12:45:38 EDT RepresentationInformation: ya-rg2294-Lewites-release.pdf ReportingModule: PDF-hul, Rel. 1.12.2 (2019-12-10) LastModified: 2022-03-24 11:32:22 EDT Size: 595132 Format: PDF Version: 1.5 Status: Not well-formed SignatureMatches: PDF-hul ErrorMessage: Unexpected exception java.lang.NullPointerException ID: PDF-HUL-94 MIMEtype: application/pdf PDFMetadata: Objects: 113 FreeObjects: 3 IncrementalUpdates: 2 DocumentCatalog: PageLayout: SinglePage PageMode: UseNone Info: Title: Author: Subject: ID: 0x62623566356537642d626531322d346562322d626237372d303237313039346565326232, 0x4c53096e3584a1e81fb1b63f1331aae9 Filters: FilterPipeline: FlateDecode FilterPipeline: DCTDecode Images: Image: NisoImageMetadata: FormatName: image/jpg CompressionScheme: JPEG ImageWidth: 2550 ImageHeight: 476 BitsPerSample: 8 BitsPerSampleUnit: integer Intent: Perceptual Interpolate: true Image: NisoImageMetadata: FormatName: image/jpg CompressionScheme: JPEG ImageWidth: 2550 ImageHeight: 476 BitsPerSample: 8 BitsPerSampleUnit: integer Intent: Perceptual Interpolate: true Fonts: Type0: Font: BaseFont: TimesNewRomanPSMT Encoding: Identity-H ToUnicode: true Font: BaseFont: SymbolMT Encoding: Identity-H ToUnicode: true TrueType: Font: BaseFont: TimesNewRomanPSMT FirstChar: 32 LastChar: 122 FontDescriptor: FontName: TimesNewRomanPSMT Flags: Nonsymbolic FontBBox: -568, -216, 2046, 693 Encoding: WinAnsiEncoding Font: BaseFont: UKHNYQ+Georgia FontSubset: true FirstChar: 32 LastChar: 122 FontDescriptor: FontName: UKHNYQ+Georgia Flags: Nonsymbolic FontBBox: -490, -303, 1797, 1075 FontFile2: true Encoding: MacRomanEncoding Font: BaseFont: BUXLXF+TimesNewRomanPSMT FontSubset: true FirstChar: 33 LastChar: 46 FontDescriptor: FontName: BUXLXF+TimesNewRomanPSMT Flags: Symbolic FontBBox: -568, -307, 2046, 1039 FontFile2: true ToUnicode: true Font: BaseFont: ArialMT FirstChar: 32 LastChar: 32 FontDescriptor: FontName: ArialMT Flags: Nonsymbolic FontBBox: -665, -210, 2000, 728 Encoding: WinAnsiEncoding Font: BaseFont: PCYGQU+Calibri FontSubset: true FirstChar: 33 LastChar: 33 FontDescriptor: FontName: PCYGQU+Calibri Flags: Symbolic FontBBox: -503, -313, 1240, 1026 FontFile2: true ToUnicode: true Font: BaseFont: UVDZOW+TimesNewRomanPSMT FontSubset: true FirstChar: 33 LastChar: 93 FontDescriptor: FontName: UVDZOW+TimesNewRomanPSMT Flags: Symbolic FontBBox: -568, -307, 2046, 1039 FontFile2: true ToUnicode: true Font: BaseFont: VZTQZF+TimesNewRomanPS-BoldMT FontSubset: true FirstChar: 33 LastChar: 41 FontDescriptor: FontName: VZTQZF+TimesNewRomanPS-BoldMT Flags: Symbolic FontBBox: -558, -328, 2000, 1055 FontFile2: true ToUnicode: true Font: BaseFont: TimesNewRomanPS-BoldMT FirstChar: 32 LastChar: 121 FontDescriptor: FontName: TimesNewRomanPS-BoldMT Flags: Nonsymbolic FontBBox: -558, -216, 2000, 677 Encoding: WinAnsiEncoding CIDFontType2: Font: BaseFont: TimesNewRomanPSMT CIDSystemInfo: Registry: Adobe Registry: Identity Supplement: 0 FontDescriptor: FontName: TimesNewRomanPSMT Flags: Nonsymbolic FontBBox: -568, -216, 2046, 693 FontFile2: true Font: BaseFont: SymbolMT CIDSystemInfo: Registry: Adobe Registry: Identity Supplement: 0 FontDescriptor: FontName: SymbolMT Flags: Nonsymbolic FontBBox: 0, -216, 1113, 693 FontFile2: true XMP:

Pages: Page: Sequence: 1 Annotations: Annotation: Subtype: Widget Rect: 0, 0, 0, 0 Flags: Print AppearanceDictionary: true Page: Sequence: 2 Page: Sequence: 3 Page: Sequence: 4 Page: Sequence: 5 Page: Sequence: 6
prettybits commented 1 year ago

This issue should now be fixed in the current integration branch, if any of you want to confirm before the next major release.

@carlwilson I think this can be closed now, unless you usually wait for a proper release to include related fixes?

carlwilson commented 1 year ago

I'm leaving closing issues until the final build is ready then I'll run down and double-test them just to be sure.

prettybits commented 1 year ago

@carlwilson Did you have time to double-test yet with the new 1.28 release (thanks!)? As I said I believe this issue could now be closed and is the only one of the linked issues in the release notes that is still open.

ross-spencer commented 1 year ago

@prettybits I didn't notice this fix, but thanks for looking at it. From my perspective, I can see the error is no longer occurring for the files attached above. Tested on openjdk version "11.0.19" 2023-04-18. PDF-HUL 1.12.4.

carlwilson commented 1 year ago

Apologies all, I've still got to triage the open errors and test. It's a day's work and I'll be doing it ASAP, realistically in the next 3 weeks.