w3c / DOM-Parsing

DOM Parsing and Serialization
https://w3c.github.io/DOM-Parsing/
Other
27 stars 14 forks source link

Provide an API to serialize with the "require well-formed" parameter set to true #84

Open geofft opened 3 months ago

geofft commented 3 months ago

The serializeToString static method of XMLSerializer is specified to "produce an XML serialization of root passing a value of false for the require well-formed parameter, and return the result." It's a little bit confusing that something called XMLSerializer might return something that isn't actually valid XML, but I understand that this can't be changed for backwards compatibility. Still, it would be useful to have a mechanism that sets the "require well-formed" parameter to be true, i.e., that throws if the node cannot be serialized to XML.

Background: I'm trying to use the technique in this blog post to render HTML to an image by creating an SVG with a <foreignObject> containing the HTML. As noted on the page, because SVG is XML, you need the contents of <foreignObject> to be valid XML. Doing this with serializeToString, which the post suggests, works for most documents, but not certain less-than-well-formed HTML documents that successfully parse in the browser. The specific case I ran into was an attribute that unescaped quotation marks in the value:

<meta property="og:description" content="I forgot to "escape" this value">

which gets parsed as

<meta property="og:description" content="I forgot to " escape"="" this="" value"="">

i.e., it picks up some attributes whose names have a quotation mark in them. (You can see this by setting an element's innerHTML to the first string and then reading innerHTML again.) This can't be represented in XML, but serializeToString successfully returns an "XML" document with this syntax, which the browser cannot deserialize as XML (e.g., in an <img> with SVG source, or with new DOMParser().parseFromString(xml, "text/xml")).

I can try to see if DOMParser succeeds and throw away the parse if successful, or catch the error event from the <img>, but it would be cleanest if I could just get serializeToString to fail in the first place. Is it possible to add an optional boolean parameter serializeToString(document, requireWellFormed) that defaults to false, or a property of the XMLSerializer, or something?

(Originally reported as https://bugzilla.mozilla.org/1914813 because I didn't realize the spec requires this, but it does, and the behavior is the same in Firefox, Safari, and Chrome. See also mdn/content#35585.)