Closed sdalu closed 3 years ago
There is no such thing as a self closing tag syntax in HTML5. Try executing the following in the console of your favorite browser:
div = document.createElement('div'); div.innerHTML = '<div><bib/><bib/></div>'; div.innerHTML
The result you will see is:
"<div><bib><bib></bib></bib></div>"
The purpose of HTML5 (and therefore Gumbo) is to parse input as browser do.
In more detail, bib
isn't an element defined in HTML, so the tree-construction stage of the HTML parser acts (in this case) according to the "Any other start tag" in the "in body" insertion mode. This says to insert an element for the tag and make it the current node in the tree. The fact that it was self-closing is ignored.
The second <bib/>
is treated identically: a bib
element is inserted as a child of the current node which is the first bib
element.
There actually are some elements in HTML that can be self-closing. These are the void elements (those that have no contents) and some SVG and MathML elements.
Any other start tags that are self-closing are parse errors. Specifically, a non-void-html-element-start-tag-with-trailing-solidus
parse error. That link contains the example <div/><span></span><span></span>
where the two span
elements are children of the div
.
If I were to guess, I'd say your example will give rise to three parse errors: one about missing DOCTYPE
and two non-void-html-element-start-tag-with-trailing-solidus
errors. Let's give it a go.
doc = Nokogiri::HTML5('<div><bib/><bib/></div>', max_errors: 20)
doc.errors.each { |err| puts(err) }
prints out
1:1: ERROR: Expected a doctype token
<div><bib/><bib/></div>
^
1:6: ERROR: Start tag of nonvoid HTML element ends with '/>', use '>'.
<div><bib/><bib/></div>
^
1:12: ERROR: Start tag of nonvoid HTML element ends with '/>', use '>'.
<div><bib/><bib/></div>
^
1:18: ERROR: That tag isn't allowed here Currently open tags: html, body, div, , .
<div><bib/><bib/></div>
^
Missed one! The final </div>
is an error from the rule 'An end tag whose tag name is one of: […] "div" […]' in the "in body" insertion mode. In this case, the current node (the second bib
element) is not an HTML element with the same tag name as the </div>
token.
That final error does suggest we need to fix the error message though. There should be two bib
s there.
Thanks, for all the precisions. (perhaps I'll need to fallback to xhtml)
When using self-closing tag the parser doesn't close correctly the tag. It doesn't seem to happen with tag which are part of html5
Result
Expected