openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
161 stars 78 forks source link

Update to use EPUB 5.1.0 #892

Closed karenhanson closed 4 months ago

karenhanson commented 7 months ago

This change updates the EPUB module to use the latest EPUBCheck version 5.1.0. Biggest thing to note is that CREATION_DATE is currently missing from the report due to a bug, but will return in the future when fixed. I wasn't sure if that was significant enough to hold back this PR any longer since other things are fixed in this version. For example, we have seen this issue at Portico and it is fixed in the latest version.

Changes made include:

  1. EPUB is now managed by W3C. The latest release is still in conjunction with IDPF, so for now they continue to be included in the agent name. Next iteration might require that we switch the agent information to W3C only.
  2. As mentioned, CREATION_DATE is currently missing from the report, I've logged an issue and commented out the relevant lines in tests that are now failing. It will be fixed in the next maintenance release and I will move to 5.1.1 when available to add creation date back in. I'm not sure if this will happen before or after the next JHOVE release, I'm keeping an eye out for an update.
  3. The new version lists resources with fragments making the resource list much longer without reflecting new files in the package. Added logic to remove resources with fragments and only list base URL.
  4. EPUBLocation changed slightly, updated code to support that change.
  5. Some redundant messages were removed, which changed some message counts in tests - fixed message counts to reflect correct output.
  6. Update validation version to 3.3 - there is no way to determine the minor version number in an EPUB, so validation is always against whatever the latest is.
  7. A file that was being used to test title has flipped from Well Formed and Valid to Well Formed and Not Valid according to current criteria - updated the test to reflect this.

This closes issue #857