openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

Missing offsets with pdf-hul 1.12.6 / problems with backwards compatability of error messages #940

Closed asciim0 closed 3 months ago

asciim0 commented 3 months ago

Something appears to have changed between the output from PDF-hul 1.12.4 and PDF-hul 1.12.6. This can pe reproduced using the attached file.

The output with PDF-hul 1.12.4 is: Status: Well-Formed, but not valid SignatureMatches: PDF-hul ErrorMessage: Invalid indirect destination - referenced object 'section*.9.6' cannot be found ID: PDF-HUL-149 ErrorMessage: edu.harvard.hul.ois.jhove.module.pdf.PdfInvalidException: Invalid destination object ID: PDF-HUL-122 Offset: 56232 ErrorMessage: edu.harvard.hul.ois.jhove.module.pdf.PdfInvalidException: Invalid indirect destination - referenced object 'section*.9.6' cannot be found ID: PDF-HUL-122 Offset: 56232

The output with PDF-hul 1.12.6 is: Status: Well-Formed, but not valid SignatureMatches: PDF-hul ErrorMessage: Invalid destination object ID: PDF-HUL-1 ErrorMessage: Invalid indirect destination - referenced object 'section*.9.6' cannot be found ID: PDF-HUL-149

We notice the following differences:

My suspicion is that all error messages (the three from 1.12.4 and the two from 1.12.6) describe a singular error that could be captured in one single error messages. Either way, the offset should be included.

102_PERC2019_Stein.pdf

carlwilson commented 3 months ago

OK, I've pinned down the change, it's this PR #882 . The addition is sound as @samalloing points out, but the release details could have been more informative. I agree with the ID change as PDF-HUL-122 is a catch for general exceptions and these are more specific and have extra information. So PDF-HUL-1 and PDF-HUL-149 are existing message codes that are more specific than 122. I also agree that the offset should be there and I'm on that.