openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

Why, in a HTML 4.01 Transitional document, does JHOVE 'forget' an open span tag when there is a p inside? #928

Open RvanVeenendaal opened 4 months ago

RvanVeenendaal commented 4 months ago

In the attached file (save as .html), JHOVE 1.30 reports a close (span) tag without open (span) tag at line 173. In Dutch: "ErrorMessage: Sluit tag zonder een overeenkomende open tag: Name = span, Line = 173, Column = 8". Inside this span there are two p(aragraph)s. This seems to be illegal: a span tag is an inline element which can not contain block elements like p. This is why JHOVE also reports (in Dutch): "ErrorMessage: Tag illegaal in deze context: Name = p, Container = span, Line = 171, Column = 8".

Why does JHOVE seem to have 'forgotten' about the open span tag at line 170 when it encounters the p tags inside the span? Bug or feature?

i000000.xml.html.txt This file is part of a 2002 website that can be accessed here: https://www.nationaalarchief.nl/onderzoeken/archief/2.04.115/invnr/4ED/file?eadID=2.04.115&unitID=4ED&query=staatssecretaris%20de%20vries.