Closed LDSpits closed 2 years ago
Ah right, good catch. I suspect this might affect performance quite a lot if we always have to check if the closing tag is a void tag every time (although I could be wrong).
I wonder if we can simply check that the last tag name in the stack matches with the current one being parsed (one string comparison as opposed to comparing it to all 15 strings that we would have to do). In your example, when the parser sees the closing tag </base>
, it would notice that the last tag in the stack is <head>
and not <base>
, and just not pop it.
I have some more ideas for making things in the parser faster so it's not the end of the world if we lose a bit of performance here because of this.
the only case that wouldn't work is nested "voided" tags. however, the tags are never pushed to the parser stack. So there is no chance of them becoming nested as far as I'm aware. This does make me question how they become visible in the output. Are they treated as raw
tags?
I'll adjust my fix to use the method you just mentioned and I'll open a pull request.
When I was attempting to parse a html page. I received a puzzling result from the parser. where the
head
tag would terminate early on a child tag. And the closinghtml
tag would be omitted.I have found that this is due to how "void" tags are ignored. And would like to get your opinion on my deduction of the problem.
let's take the following sample html:
the output of the parser currently is
as you can see, the
head
closing tag is abnormally placed after thebase
closing tag.after some debugging with tests i found the issue in the parser code, the tag parser properly ignores start tags. but if you hit a "void" closing tag it just treats the previously found tag as the closed tag. causing it to terminate the parent tag early.
I have written a fix for this by parsing the closing tag, and then not pushing the result if the closing tag is one of the "void" tags. Would you accept a pull request for this bug?