Open dpk opened 3 years ago
Thanks!
We should probably have a parameter object to list which tags have no closing tags.
How does HTML suppose we embed <
and >
inside script
and style
-- does it have special heuristics for how to find the matching closing tag for those opening tags, so that e.g. writing a string literal "</script>"
in JS will not infact close the script tag?
We should probably have a parameter object to list which tags have no closing tags.
So that programmers can keep up to date with the spec if new empty elements are added? Hmm. Ideally the library (or rather its implementations) would keep up to date itself. But if it does fall behind, a parameter object could be handy.
How does HTML suppose we embed
<
and>
insidescript
andstyle
-- does it have special heuristics for how to find the matching closing tag for those opening tags, so that e.g. writing a string literal"</script>"
in JS will not infact close the script tag?
It’s complicated — especially the rules for <script>
. But no, it’s not that clever.
We should probably have a parameter object to list which tags have no closing tags.
So that programmers can keep up to date with the spec if new empty elements are added? Hmm. Ideally the library (or rather its implementations) would keep up to date itself. But if it does fall behind, a parameter object could be handy.
As HTML is a living language and HTML5 is now back to being SGML- rather than XML-based, most such lists would likely go out of date at some point.
Parameter objects are always initialized with default values though. The default should be a reasonable set of tags.
Wow, I don't envy the people who write those parsers :)
Actually some SGML purist would probably think it's heretical to say HTML5 is SGML-based :) But, explicit end tags not required.
I’ve thrown together a rough-and-ready implementation of the HTML serialization algorithm, as adapted for SXML. There are still some things it handles weirdly or doesn’t handle because SXML and the HTML DOM are not a perfect isomorphism. Handling of namespaces is a particular rough spot I’d aim to look at again if this were to be adopted more widely.
(I also haven’t tested it in all its aspects yet, so there are likely a few bugs.)
HTML parsers and writers are somewhat fickle beasts.
Two things I know the implementation currently gets wrong by assuming HTML and XML are basically the same:
/>
. On the other hand, trailing/>
has no effect on any tag, whether it’s in this list or not, so it won’t close elements other than these. If I want an empty clearfix div (or whatever),<div class="clearfix" />
will not properly close the element; it has to be the explicit<div class="clearfix"></div>
.&xyz;
character escaping insidescript
andstyle
tags.There are almost certainly others.