As you see above, h@1 tag name is correctly parsed by parse5, htmlparser2, Chrome and Firefox, but isn't parsed by node-html-parser.
In terms of the question of whether code containing h@1 is 'broken' or 'malformatted' HTML - it's not. Although h@1 is not permitted by any content models, it is permitted inside elements with 'nothing' content model.
node-html-parser
currently uses the following regex pattern to parse tag name:https://github.com/taoqf/node-html-parser/blob/v6.1.14/src/nodes/html.ts#L924-L925
This is incorrect, since tag name can not only be for a custom element, but for any element. The correct part of the spec for parsing tag name is here: https://html.spec.whatwg.org/multipage/parsing.html#tag-name-state
Test case:
Output:
HTML:
Chrome:
Firefox:
As you see above,
h@1
tag name is correctly parsed byparse5
,htmlparser2
, Chrome and Firefox, but isn't parsed bynode-html-parser
.In terms of the question of whether code containing
h@1
is 'broken' or 'malformatted' HTML - it's not. Althoughh@1
is not permitted by any content models, it is permitted inside elements with 'nothing' content model.The following code:
passes HTML5 validator: