openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

JSON output handler via jhove-gui will only output information for one file where other handlers list all objects validated #667

Closed ross-spencer closed 2 years ago

ross-spencer commented 3 years ago

When using jhove-gui one can scan multiple objects at a time and see multiple results in the window. When those results are exported it is expected that both results are in the output.

For JSON output, only one of the results is returned, i.e. the JSON results display the results of just one file. Take this example (it probably doesn't reveal much as it is just one file... but two files were scanned)

{
    "jhove": {
        "name": "JhoveView",
        "release": "1.25.0-SNAPSHOT",
        "date": "2021-04-07",
        "executionTime": "2021-04-08T09:50:13+02:00",
        "repInfo": {
            "uri": "/tmp/opf/Summary.pdf",
            "reportingModule": {
                "name": "PDF-hul",
                "release": "1.12.2",
                "date": "2019-12-10"
            },
            "lastModified": "2021-04-08T00:05:16+02:00",
            "size": 132877,
            "format": "PDF",
            "version": "1.4",
            "status": "Not well-formed",
            "sigMatch": ["PDF-hul"],
            "messages": [{
                "message": "Unexpected exception java.lang.NullPointerException",
                "severity": "error",
                "id": "PDF-HUL-94"
            }],
            "mimeType": "application/pdf",
            "properties": [{
                "PDFMetadata": [{
                    "Objects": 38
                }, {
                    "FreeObjects": 3
                }, {
                    "IncrementalUpdates": 2
                }, {
                    "DocumentCatalog": [{
                        "PageLayout": "SinglePage"
                    }, {
                        "PageMode": "UseNone"
                    }]
                }, {
                    "Info": [{
                        "Title": "DocuSign-Zertifikat"
                    }, {
                        "Author": ""
                    }, {
                        "Subject": "DocuSign-Zertifikat"
                    }]
                }, {
                    "ID": ["0x35663464323637302d633562612d346261622d623964322d636163386437306164313131", "0x9d91f58b1574d3a15937b1cd2b36d684"]
                }, {
                    "XMP": "<x:xmpmeta xmlns:x=\"adobe:ns:meta/\" x:xmptk=\"Adobe XMP Core 5.1.0-jc003\">\n  <rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n    <rdf:Description rdf:about=\"\"\n        xmlns:pdf=\"http://ns.adobe.com/pdf/1.3/\"\n        xmlns:xmp=\"http://ns.adobe.com/xap/1.0/\"\n        xmlns:dc=\"http://purl.org/dc/elements/1.1/\"\n      pdf:Producer=\"PDFKit.NET 21.1.102.20091\"\n      pdf:Keywords=\"\"\n      pdf:PDFVersion=\"1.4\"\n      xmp:CreateDate=\"2021-04-08T00:05:16-07:00\"\n      xmp:ModifyDate=\"2021-04-08T00:05:16-07:00\"\n      xmp:CreatorTool=\"\"\n      xmp:MetadataDate=\"2021-04-08T00:05:16-07:00\"\n      dc:format=\"application/pdf\">\n      <dc:creator>\n        <rdf:Seq>\n          <rdf:li/>\n        </rdf:Seq>\n      </dc:creator>\n      <dc:subject>\n        <rdf:Bag/>\n      </dc:subject>\n      <dc:description>\n        <rdf:Alt>\n          <rdf:li xml:lang=\"x-default\">DocuSign-Zertifikat</rdf:li>\n        </rdf:Alt>\n      </dc:description>\n      <dc:title>\n        <rdf:Alt>\n          <rdf:li xml:lang=\"x-default\">DocuSign-Zertifikat</rdf:li>\n        </rdf:Alt>\n      </dc:title>\n    </rdf:Description>\n  </rdf:RDF>\n</x:xmpmeta>"
                }, {
                    "Pages": [{
                        "Page": [{
                            "Sequence": 1
                        }, {
                            "Annotations": [{
                                "Annotation": [{
                                    "Subtype": "Widget"
                                }, {
                                    "Rect": [0, 0, 0, 0]
                                }, {
                                    "Flags": 132
                                }, {
                                    "AppearanceDictionary": true
                                }]
                            }]
                        }]
                    }]
                }]
            }]
        }
    }
}

The equivalent output for text and xml look as follows:

Note: 2x repInfo uri

<?xml version="1.0" encoding="utf-8"?>
<jhove xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schema.openpreservation.org/ois/xml/ns/jhove" xsi:schemaLocation="http://schema.openpreservation.org/ois/xml/ns/jhove https://schema.openpreservation.org/ois/xml/xsd/jhove/1.8/jhove.xsd" name="JhoveView" release="1.25.0-SNAPSHOT" date="2021-04-07">
  <date>2021-04-08T09:49:07+02:00</date>
  <repInfo uri="/tmp/opf/combined_Please_review__sign_your_document.pdf">
    <reportingModule release="1.12.2" date="2019-12-10">PDF-hul</reportingModule>
    <lastModified>2021-04-08T09:05:38+02:00</lastModified>
    <size>319340</size>
    <format>PDF</format>
    <version>1.4</version>
    <status>Not well-formed</status>
    <sigMatch>
      <module>PDF-hul</module>
    </sigMatch>
    <messages>
      <message severity="error" id="PDF-HUL-94">Unexpected exception java.lang.NullPointerException</message>
    </messages>
    <mimeType>application/pdf</mimeType>
    <properties>
      <property>
        <name>PDFMetadata</name>
        <values arity="List" type="Property">
        ...
      </property>
    </properties>
  </repInfo>
  <repInfo uri="/tmp/opf/Summary.pdf">
    <reportingModule release="1.12.2" date="2019-12-10">PDF-hul</reportingModule>
    <lastModified>2021-04-08T00:05:16+02:00</lastModified>
    <size>132877</size>
    <format>PDF</format>
    <version>1.4</version>
    <status>Not well-formed</status>
    <sigMatch>
      <module>PDF-hul</module>
    </sigMatch>
    <messages>
      <message severity="error" id="PDF-HUL-94">Unexpected exception java.lang.NullPointerException</message>
    </messages>
    <mimeType>application/pdf</mimeType>
    <properties>
      <property>
        <name>PDFMetadata</name>
        <values arity="List" type="Property">
        ...
      </property>
    </properties>
  </repInfo>
</jhove>

Note: 2x RepresentationInformation strings

JhoveView (Rel. 1.25.0-SNAPSHOT, 2021-04-07)
 Date: 2021-04-08 09:51:15 CEST
 RepresentationInformation: /tmp/opf/combined_Please_review__sign_your_document.pdf
  ReportingModule: PDF-hul, Rel. 1.12.2 (2019-12-10)
  LastModified: 2021-04-08 09:05:38 CEST
  Size: 319340
  Format: PDF
  Version: 1.4
  Status: Not well-formed
  SignatureMatches:
   PDF-hul
  ErrorMessage: Unexpected exception java.lang.NullPointerException
   ID: PDF-HUL-94
  MIMEtype: application/pdf
  PDFMetadata: 
   ...
   Pages: 
    Page: 
     Sequence: 1
     Annotations: 
      Annotation: 
       Subtype: Widget
       Rect: 0, 0, 0, 0
       Flags: 132
       AppearanceDictionary: true
 RepresentationInformation: /tmp/opf/Summary.pdf
  ReportingModule: PDF-hul, Rel. 1.12.2 (2019-12-10)
  LastModified: 2021-04-08 00:05:16 CEST
  Size: 132877
  Format: PDF
  Version: 1.4
  Status: Not well-formed
  SignatureMatches:
   PDF-hul
  ErrorMessage: Unexpected exception java.lang.NullPointerException
   ID: PDF-HUL-94
  MIMEtype: application/pdf
  PDFMetadata: 
   ...
   Pages: 
    Page: 
     Sequence: 1
     Annotations: 
      Annotation: 
       Subtype: Widget
       Rect: 0, 0, 0, 0
       Flags: 132
       AppearanceDictionary: true

The audit output:

<?xml version="1.0" encoding="utf-8"?>
<jhove xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schema.openpreservation.org/ois/xml/ns/jhove" xsi:schemaLocation="http://schema.openpreservation.org/ois/xml/ns/jhove https://schema.openpreservation.org/ois/xml/xsd/jhove/1.8/jhove.xsd" name="JhoveView" release="1.25.0-SNAPSHOT" date="2021-04-07">
 <date>2021-04-08T09:49:53+02:00</date>
 <audit home="/home/user/jhove">
  <file mime="application/pdf" status="not well-formed">/tmp/opf/combined_Please_review__sign_your_document.pdf</file>
  <file mime="application/pdf" status="not well-formed">/tmp/opf/Summary.pdf</file>
 </audit>
</jhove>
<!-- Summary by MIME type:
<!-- [mime type]: [file count] ([valid],[well-formed],[not well-formed],[unknown])
application/pdf: 2 (0,0,2,0)
Total: 2 (0,0,2,0)
-->
<!-- Summary by directory:
<!-- [directory]: [file count] ([valid],[well-formed],[not well-formed],[unknown])
/home/user/jhove: 2 (0,0,2,0)
Total: 2 (0,0,2,0)
-->
<!-- Elapsed time: 0:00:01 -->

To repeat this:

  1. Scan two files using jhove-gui.
  2. Select save-as from the File menu.
  3. Save as each of the different formats.

NB. I haven't tried this with the CLI today. It might be worth checking - I'm not familiar what the expected behavior is from there.

Related to https://github.com/openpreserve/jhove/issues/385

orgabor commented 3 years ago

Hello, we are using the CLI option and we are facing with the exact same issue described by @ross-spencer

ambs commented 3 years ago

Same problem here, moving to XML temporarily.

Slange-Mhath commented 2 years ago

Hey,

I know that this is a long shot, but I think the issue still exists and I was wondering if anyone has an idea how to solve it. It only occurs when we are trying to get the output in JSON, XML seems to be fine. Looking back into the history it seems like #515 and #544 introduced the JSON output, so we were wondering if this will still be maintained @carlwilson?

Thank you so much!