openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

TIFF tag 41728 inconsistency #767

Closed thorsted closed 1 year ago

thorsted commented 2 years ago

The TIFF tag 41728 (Filesource) is not extracted properly. In the GUI the "Filesource" line is visible but no value. In XML output, tag is skipped entirely.

Screen Shot 2022-06-27 at 4 30 37 PM

Attached is three examples of this tag used in a TIFF. FileSource-Samples.zip

First is the tag used in Exif properly (test01-ExifIFD.tif) Another is a custom tag with no Exif (test01-IFD0.tif) used by LOC NDNP Another is the custom tag converted to Exif with maintaining string instead of numerical value (test01-IFD0v2.tif)

Custom tag is ignore by MD extraction, but when used properly in Exif value is not extracted by JHOVE.

tledoux commented 1 year ago

The proposed PR will allow he first file to be handled properly in all the outputs (indeed, having no description for the values 1 and 2, implies an empty output in text and no tags to be output for XML).

For the 2nd file, you get an message TIFF-HUL-12 stating correctly that the tag TIFF IFD 41728 is unknown, since it should be in an EXIF IFD.

For the third file, the TIFF-HUL is very unforgiving. Indeed, the EXIF IFD 41728 has a ASCII (2) type where the UNDEFINED (7) type is expected for this IFD as required by the Exif standard. Note also that the length of the IFD is supposed to be 1 byte...

Here we hit a questionnable behaviour of the TIF-hul that stops processing as soon as a "bad" use of a type is encountered. The behaviour shoud probably be more lax, but it's currently the case...