openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

XML-hul 1.5.3: schemaLocation attribute parser is too strict about whitespace in pairs of NS and schema location URI #883

Closed UmbrellaDish closed 8 months ago

UmbrellaDish commented 1 year ago

In JHOVE 1.28, the status of an invalid xml turns out to be "Well-formed" instead of "Well-formed, but invalid" if there are two or more whitespace characters for instance separating paired namespace URI and the corresponding schema location URI. No list of errors is given, so I deem validation is not taking place to begin with. Once I reduce the sequence of whitespace to one, schema is loaded and properly validated against.

I am not sure I read the right specification correctly maybe I have missed some "single whitespace within NS+location pairs in schemaLocation" rule, but to me it looks like a bug. Intuitively, what is key and value to be stored in the internal map should be inferred from their position in the list. If said rule exists, would it not need to be validated as well and be an error when wrong?

david-russo commented 1 year ago

You're correct, that's a bug. The current code only expects a single whitespace where it should allow any amount of whitespace.

I've submitted a PR which should fix the issue.