openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

How to deal with Unknown TIFF IFD tag INFO messages? #657

Open RvanVeenendaal opened 3 years ago

RvanVeenendaal commented 3 years ago

At the National Archives of the Netherlands we're ingesting many TIFF scans. JHOVE reports several INFO messages about Unknown TIFF IFD tag: [number] per scan. Times thousands, the result is that we're seeing very large log files (in our Preservica EE-based repository) and long-running log file analysis scripts. How best to limit the amount of messages?

An example of a publicly available TIFF file can be found here: https://www.nationaalarchief.nl/onderzoeken/archief/1.05.11.13/invnr/230/file/NL-HaNA_1.05.11.13_230_0002 (direct download link: https://service.archief.nl/gaf/api/file/v1/original/2c1e155a-696c-4c6d-9468-d3d6ab2a1ec4). This file reports as [number] in JHOVE 1.24.1: 36868, 37510, 40091-5, 40961 and 65001.

Should JHOVE include all known public and private TIFF IFD tags (see e.g. https://www.awaresystems.be/imaging/tiff/tifftags.html and https://www.loc.gov/preservation/digital/formats/content/tiff_tags.shtml)? Or can we remove these messages from JHOVE's output using parameters. Or...?

carlwilson commented 2 years ago

We are looking at implementing a filter system for messages/validation to allow users to ignore messages based on a config file with IDs. More details to follow.