pre-srfi / minimal-html-css-writer

Minimum viable library to write SXML/S-CSS forms into XML/HTML/CSS files
1 stars 0 forks source link

HTML syntax is somewhat more subtle than the implementation currently considers. #4

Open dpk opened 3 years ago

dpk commented 3 years ago

HTML parsers and writers are somewhat fickle beasts.

Two things I know the implementation currently gets wrong by assuming HTML and XML are basically the same:

There are almost certainly others.

lassik commented 3 years ago

Thanks!

We should probably have a parameter object to list which tags have no closing tags.

How does HTML suppose we embed < and > inside script and style -- does it have special heuristics for how to find the matching closing tag for those opening tags, so that e.g. writing a string literal "</script>" in JS will not infact close the script tag?

dpk commented 3 years ago

We should probably have a parameter object to list which tags have no closing tags.

So that programmers can keep up to date with the spec if new empty elements are added? Hmm. Ideally the library (or rather its implementations) would keep up to date itself. But if it does fall behind, a parameter object could be handy.

How does HTML suppose we embed < and > inside script and style -- does it have special heuristics for how to find the matching closing tag for those opening tags, so that e.g. writing a string literal "</script>" in JS will not infact close the script tag?

It’s complicated — especially the rules for <script>. But no, it’s not that clever.

lassik commented 3 years ago

We should probably have a parameter object to list which tags have no closing tags.

So that programmers can keep up to date with the spec if new empty elements are added? Hmm. Ideally the library (or rather its implementations) would keep up to date itself. But if it does fall behind, a parameter object could be handy.

As HTML is a living language and HTML5 is now back to being SGML- rather than XML-based, most such lists would likely go out of date at some point.

Parameter objects are always initialized with default values though. The default should be a reasonable set of tags.

It’s complicated — especially the rules for <script>.

Wow, I don't envy the people who write those parsers :)

lassik commented 3 years ago

Actually some SGML purist would probably think it's heretical to say HTML5 is SGML-based :) But, explicit end tags not required.

dpk commented 3 years ago

I’ve thrown together a rough-and-ready implementation of the HTML serialization algorithm, as adapted for SXML. There are still some things it handles weirdly or doesn’t handle because SXML and the HTML DOM are not a perfect isomorphism. Handling of namespaces is a particular rough spot I’d aim to look at again if this were to be adopted more widely.

dpk commented 3 years ago

(I also haven’t tested it in all its aspects yet, so there are likely a few bugs.)