openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
169 stars 79 forks source link

XML incorrectly not-well-formed because of http in Link to Schema #918

Closed Bodensuri closed 1 month ago

Bodensuri commented 6 months ago

We are ingesting many XML files that are classified by JHOVE as "not well-formed" although they are well-formed. Here is an example: 12745764.zip These XML files were created by Abbyy Finereader. The contain an http link to a Schema. If "http" is changed into "https", the file becomes well-formed. Since the XML Version is not declared on top of the file, it is an XML 1.0. XML 1.0 does not require a Schema. If the schema location was wrong, it would perhaps invalid, but still well-formed.

JhoveView (Rel. 1.28.0, 2023-05-18) Date: 2024-03-25 18:58:12 MEZ RepresentationInformation: C:\Users\rsuri\Downloads\12745764.xml ReportingModule: XML-hul, Rel. 1.5.3 (2023-03-16) LastModified: 2024-03-25 18:40:00 MEZ Size: 829103 Format: XML Status: Not well-formed SignatureMatches: XML-hul ErrorMessage: SAXParseException: Premature end of file. Line = -1, Column = -1. ID: XML-HUL-1 MIMEtype: text/xml

carlwilson commented 6 months ago

Thanks for reporting this. We will try to reproduce the issue and get back to you if we have questions.