Closed ScumCoder closed 1 year ago
Come to think about it, there is something fishy about previous whitespaces as well. A document looking like this
<!DOCTYPE html>
<html>
<head>
</head>
<body>
</body>
</html>
should produce a root HTML node with five children, not three:
each whitespace consisting of a single newline character.
Come to think about it, there is something fishy about previous whitespaces as well. A document looking like this
<!DOCTYPE html> <html> <head> </head> <body> </body> </html>
should produce a root HTML node with five children, not three:
- WHITESPACE
- HEAD
- WHITESPACE
- BODY
- WHITESPACE
each whitespace consisting of a single newline character.
If you load that document into Chromium and run document.documentElement.childNodes.length
in the console, it gives a result of 3. Likewise for Firefox.
So without consulting the spec, I'm inclined to think Gumbo is doing what it's supposed to do.
When parsing a trivial document, the
GumboStringPiece
containing theoriginal_text
of theGumboText
describingGUMBO_NODE_WHITESPACE
, has incorrectlength
value, which causes it to include closing tags.Also, the
text
field contains two linebreaks instead of one.See SSCCE here.
Used version is aa91b27.