openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
161 stars 78 forks source link

Fixes unhandled fatal error when EPUB content.opf date is badly formatted. #921

Open karenhanson opened 2 months ago

karenhanson commented 2 months ago

We had an EPUB with this value in the content.opf metadata file:

      <opf:meta property="dcterms:modified">
            2012-04-12T12:00:00Z
        </opf:meta>

The EPUB module checks for a valid date. In this case, it was pulling in the whitespace around the value and creating an invalid date. This revealed that the Exception resulting from a badly formatted date would cause the report to crash without proper handling. To replicate it, you can edit the content.opf for an EPUB file and replace the date with the one above.

The reason was that the ErrorMessages.properties file was on the wrong path for the EPUB module (it was with the harvard module messages). Most error messages generated by this module are handled in EpubChecker, which is why this wasn't caught previously... but there are a few generated in JhoveRepInfoReport that are handled separately - those were not working.

To fix, I moved the ErrorMessages.properties to the correct location, and also added a trim() to remove the extra whitespace around the date value, since I don't think this should cause a failure. I did not increment any versions for this yet.

Let me know if you need anything else!