openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

Does JHOVE need a warning Message type? #638

Open carlwilson opened 4 years ago

carlwilson commented 4 years ago

Currently, JHOVE supports error and info type messages. The first category is for validation failures while the second category covers everything else in the reports. I'm currently working on a JHOVE module for TIFF files based on DPF manager which supports Error, Info and Warning messages. Warnings are for features that, while valid, are undesirable for long term preservation.

sromkey commented 4 years ago

I think that sounds really useful. In Archivematica we could capture the output in the METS and users could use that information in consideration of long term preservation.

ross-spencer commented 4 years ago

Interesting question Carl. Fairly analogous to logging modules in programming languages.

Do you have examples of warnings DPF manager outputs?

If warnings translate from keywords like SHOULD (this format should do something according to the specification but it’s not) and SHOULD NOT (this format is doing something that according to the specification it shouldn’t) then warning would make a lot of sense to me. IDK does that make sense for others? There may be other cases too others can highlight?

That being said, the one question that the framing of the DPF definition of a warning raises for me, that might be worth pondering, is that it sounds like opinion could/would be part of that, so how is that managed and built for in the long-run and sustainably where opinions differ, are debated, and can change in time?

On 7 Jul 2020, at 07:55, Carl Wilson notifications@github.com wrote:

 Currently, JHOVE supports error and info type messages. The first category is for validation failures while the second category covers everything else in the reports. I'm currently working on a JHOVE module for TIFF files based on DPF manager which supports Error, Info and Warning messages. Warnings are for features that, while valid, are undesirable for long term preservation.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

thorsted commented 4 years ago

If Errors are based upon a file format specification, what are the warnings based on, who decides what is undesirable? I agree with Ross, opinions could creep in.

carlwilson commented 4 years ago

Thanks for the quick answers all. I share some of the misgivings around "opinions" but have one or two real cases that highlight where the concept might be useful. JHOVE currently will issue an info message when processing a font heavy PDF informing the user it effectively gave up on font processing. One DPF Manager warning is for use of deprecated tags in a TIFF, this is currently downgraded to an info message by my tentative DPF manager module. These don't feel as though they belong at the same severity level.

If the user could do something to configure the handling of messages, e.g. ignore warnings or even specific warnings might that be of more interest?

@ross-spencer the SHOULD / SHOULD NOT cases were specifically what I had in mind when initially pondering this. @thorsted these would still come from specifications.

@sromkey I'll take that as a +1 with support :)

ross-spencer commented 4 years ago

If the user could do something to configure the handling of messages, e.g. ignore warnings or even specific warnings might that be of more interest?

I don't know what other programming languages do this, but in Flake8 Python conformance, you can create an .ini config file in which you can elects to ignore certain warnings. There are a lot in Python, so you can ignore an entire class of warnings down to the specific numerical value for a warning.

It's definitely effective from a code-perspective but then code can be more opinionated. When something finally chooses not to work it won't.

I'd definitely say a +0.5 here for my , especially for the should/shouldn't types. (In your examples above, the deprecated tags issue definitely feels like a candidate. The font-warning feels fuzzier, like a different class around optimization of the program, but yeah, of course it's also going to affect preservation. But it's not disallowed by the format in any-way. I wonder how a PAR might model an institutional opinion around this?)

bitsgalore commented 4 years ago

The current lack of a WARNING type also caused some problems in EPUBCheck (which optionally uses the JHOVE schema, although it looks by now they want to get rid of it), see below issue:

https://github.com/w3c/epubcheck/issues/789

I suppose this will also affect JHOVE's EPUB module (which is based on EPUBCheck).

tledoux commented 4 years ago

For the record, the Jhove schema (https://schema.openpreservation.org/ois/xml/xsd/jhove/jhove.xsd) HAS a warning level of severity.

I requested this addition so that epubcheck can output its information following jhove schema. It can be obtain using the --out argument in the command line.

So to add a warning level to Jhove, there is only the need to modify the code to output it : the schema is ready for it. The question of when to generate it remains to be solved...

tledoux commented 4 years ago

For the record, the Jhove schema (https://schema.openpreservation.org/ois/xml/xsd/jhove/jhove.xsd) HAS a warning level of severity.

I requested this addition so that epubcheck can output its information following jhove schema. It can be obtain using the --out argument in the command line.

So to add a warning level to Jhove, there is only the need to modify the code to output it : the schema is ready for it. The question of when to generate it remains to be solved...

tledoux commented 4 years ago

For the record, the Jhove schema (https://schema.openpreservation.org/ois/xml/xsd/jhove/jhove.xsd) HAS a warning level of severity.

I requested this addition so that epubcheck can output its information following jhove schema. It can be obtain using the --out argument in the command line.

So to add a warning level to Jhove, there is only the need to modify the code to output it : the schema is ready for it. The question of when to generate it remains to be solved...

karenhanson commented 4 years ago

I just noticed @bitsgalore's comment while checking for new EPUB module issues. As it happens the update I just did caused me to look at the messages for EPUBCheck. Indeed, right now warnings are awkwardly funneled into errors in the EPUB JHOVE module, which is less than ideal. Interestingly though, EPUBCheck now uses the latest JHOVE schema and defines them as "warnings"! I basically had to undo this to make it a JHOVE module. I think marking these as warnings would be clearer, though I wonder if the EPUBCheck definition will map exactly to the one applied by JHOVE: http://kb.daisy.org/publishing/docs/epub/validation/epubcheck.html#messages If it does it's an easy enhancement to an if-statement once WarningMessage is available: https://github.com/openpreserve/jhove/blob/47f077fcb09d3dd5720aeeb4abf27de5dac48d19/jhove-ext-modules/src/main/java/org/ithaka/portico/jhove/module/EpubModule.java#L578-L582

Given that the JHOVE schema already supports it, I think it would be consistent for the JHOVE code to allow it i.e. an extremely belated +1!