openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
164 stars 78 forks source link

XML 1.5.2 module in JHOVE 1.26 returning thousands of identical InfoMessages #834

Closed leninoc closed 1 year ago

leninoc commented 1 year ago

hi, we have an alto xml file which has a link to local xsd file in it. When I run standalone jhove 1.24 (xml-hul 1.5.1) I get XML-HUL-3 SaxException cause: java.lang.ClassCastException. WHen i run same file in 1.26 jhove (xml-hul 1.5.2) then I get XML-HUL-1 SAXParseException error, plus 67748 (!!!) InfoMessages with this SubMessage: SubMessage: schema_reference.4: Failed to read schema document '//docstorage2/impdata1/docWORKS_KBNL/schema/alto-1-2.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not xsd:schema.

Submessage is correct, its ok that the standalone jhove cannot read the local xsd. In Rosetta we have a mechanism which makes this possible for jhove plugins (via jhove.conf etc.).

This might be related to #745 fix. That issue was fixed in 1.26 but now JHOVE 1.26 is complaining about different thing, with huge number of repeating infomessages. The number of infomessages is using a lot of resources in our system. The XML file in question is attached as a zip 0003.zip best

carlwilson commented 1 year ago

Hi @leninoc . We made some reporting improvements in our recent v1.28 release candidate which includes elimination of duplicate error messages. Could you please test the new RC and see if this fixes your problem please.

leninoc commented 1 year ago

hi @carlwilson , i did test it when 1.28 RC was released and again now, and it does behave very differently to 1.26 - in short - no more duplicate messages, just 1 InfoMessage + 1 ErrorMessage with relevant SubMessage, so that is great! Seems that the changes made in 1.28 worked. Thank you

carlwilson commented 1 year ago

That's great to hear @leninoc, but I can see that you have some issues with the message forms and detail buried in sub-messages elsewhere. I will take a look at this sometime next week and see if there are any quick wins.