Closed mfreed7 closed 8 months ago
innerHTML
is unfortunate in that it changes behavior depending on the HTML vs. XML bit of the owning document. Having this API that says "HTML" always serialize as HTML (and the "set" counterpart parse as HTML), then having XML counterparts, and having these on all nodes regardless of the HTML vs. XML bit of the owning document would probably be a less problematic design than the old innerHTML
behavior that varies with the HTML vs. XML bit of the owner doc, since a given method would always do the same thing regardless of owner-document-level state.
Oops, the previous comment went to the wrong issue. For this issue, being able to serialize what can be parsed makes sense for symmetry.
Oops, the previous comment went to the wrong issue. For this issue, being able to serialize what can be parsed makes sense for symmetry.
Thanks for the (positive) feedback! Any thoughts on the proposed shape of the API?
Now that the DSD patches have all landed, can we come back to this one? Any objections to adding:
const html = element.getInnerHTML({
includeShadowRoots: true,
closedRoots: [shadowRoot1, shadowRoot2, ...]
});
?
So the new parser entry points are called setHTMLUnsafe()
and setHTML()
. For symmetry I'd expect this to be called getHTML()
, assuming we don't need the safe/unsafe distinction.
As for the shape of the API, I don't like that it makes a distinction between closed and open shadow roots. I thought that many years ago we reached agreement that what you need for closed shadow roots we'd also use for open.
So the new parser entry points are called
setHTMLUnsafe()
andsetHTML()
. For symmetry I'd expect this to be calledgetHTML()
, assuming we don't need the safe/unsafe distinction.
Yep, makes sense to me. I don't believe (?) there's any need for an "unsafe" version, at least that I can see.
As for the shape of the API, I don't like that it makes a distinction between closed and open shadow roots. I thought that many years ago we reached agreement that what you need for closed shadow roots we'd also use for open.
The problem is that providing a list of the contained roots is fairly cumbersome. You have to walk the entire tree, before getHTML
then walks it again. I can see this performance penalty alone, let alone the ergonomics, being a developer complaint. How about:
const html = element.getHTML({
includeShadowRoots: true,
});
which just serializes them all? That certainly treats them equally.
I thought that many years ago we reached agreement that what you need for closed shadow roots we'd also use for open.
Incidentally, can you link me to that agreement? I see it cited often, but I don't quite understand the rationale. I'd love to read the back-story. (I was not part of the "we" in that sentence.)
Oops, the previous comment went to the wrong issue. For this issue, being able to serialize what can be parsed makes sense for symmetry.
Thanks for the (positive) feedback! Any thoughts on the proposed shape of the API?
The shape looks OK. I think it would be worthwhile to write down that if a way to obtain the XML serialization was added, it should be a second method and not a flag on the option bag (for easy feature detectability and for not having something as big as selection of the serialization syntax as an option bag item). We don't need to add an XML counterpart right now absent evidence of Web developer demand, but I think we should think through and write down how we'd go about introducing an XML counterpart if demand shows up.
The shape looks OK.
I should clarify, that I meant the shape on the level of 1) new (relative to innerHTML
) HTML serialization getter, 2) it being about the HTML serialization (potential XML serialization as a distinct method), and 3) tunables within the HTML serialization as an option bag.
I didn't review the specifics of the tunable options.
The shape looks OK. I think it would be worthwhile to write down that if a way to obtain the XML serialization was added, it should be a second method and not a flag on the option bag (for easy feature detectability and for not having something as big as selection of the serialization syntax as an option bag item). We don't need to add an XML counterpart right now absent evidence of Web developer demand, but I think we should think through and write down how we'd go about introducing an XML counterpart if demand shows up.
Thanks! I agree with your points about XML. Where do you think we should write this down, just as a Note
in the spec?
Coming back to the shape, what about this (small) tweak?
const html = element.getHTML({
shadowRoots: [shadowRoot1, shadowRoot2, ...]
includeAllShadowRoots: true,
});
The idea would be:
el.getHTML({shadowRoots: [root1, root2, ...]}) // Only serializes into `root1`, `root2`, ... and skips other roots. Roots can be open or closed here.
el.getHTML({includeAllShadowRoots:true}) // Serializes all open shadow roots, and no closed ones (because they're closed).
el.getHTML({includeAllShadowRoots:true, shadowRoots: [root1, ...]}) // Serializes all open shadow roots plus anything in the shadowRoots array.
I'm just trying to avoid the problem that this API should be usable without having to do full recursive DOM traversal first. I'm trying to incorporate the "open and closed roots should be the same" feedback, admittedly without completely understanding it.
So we discussed this at the https://github.com/whatwg/html/issues/9937 meeting, and came to rough consensus on a model like this:
attachShadow()
and <template shadowrootmode>
to allow the shadow root to say "I opt in to serialization of this shadow root without providing a reference to me".getHTML()
that says "please serialize opted-in shadow roots". The default for this option is false
.getHTML()
containing a sequence of shadowRoot
s that should be serialized, independent of whether they opted in via the flag above.Assuming the above is correct, I propose this set of names:
element.attachShadow({serializable: true}); // (1)
// Or <template shadowrootmode serializable> // (1)
const html = element.getHTML({
includeSerializable: true, // (2)
shadowRoots: [shadowRoot1, shadowRoot2, ...] // (3)
});
Comments? Suggestions for better names?
I think includeShadowRoots
(which defaults to true if you supply shadowRoots
and throws if you set it to false while also supplying shadowRoots
) makes more sense than includeSerializable
.
We probably want to have ShadowRoot.prototype.serializable
as well for introspection (similar to the clonable
suggestion).
I think
includeShadowRoots
(which defaults to true if you supplyshadowRoots
and throws if you set it to false while also supplyingshadowRoots
) makes more sense thanincludeSerializable
.
Yep - good suggestions, both.
We probably want to have
ShadowRoot.prototype.serializable
as well for introspection (similar to theclonable
suggestion).
I agree.
Quick update to include the last set of comments:
element.attachShadow({serializable: true}); // (1)
// Or <template shadowrootmode serializable> // (1)
const html = element.getHTML({
includeShadowRoots: true, // (2)
shadowRoots: [shadowRoot1, shadowRoot2, ...] // (3)
});
// Examples
element.getHTML(); // does not serialize shadow roots
element.getHTML({includeShadowRoots: true}); // returns roots whose root.serializable is true
element.getHTML({shadowRoots: roots}); // also returns roots whose root.serializable is true
element.getHTML({includeShadowRoots: false, shadowRoots: roots}); // throws an exception
element.shadowRoot.serializable; // Returns boolean
If no objections, I'll put up a spec PR to add this.
I'd like to clarify two things that came up as part of the Chromium review of the changes listed above.
First is that in this example:
element.getHTML({includeShadowRoots: true}); // returns roots whose root.serializable is true
I assumed, but would like to confirm, that closed roots, even with serializable
set to true
, will not be returned by the code above. The only way to serialize a closed shadow root is to specify it explicitly using shadowRoots:[root]
.
The second question is about the includeShadowRoots
argument. I had assumed it was some kind of tri-state:
element.getHTML(); // A) does not serialize *any* shadow roots
element.getHTML({includeShadowRoots: true}); // B) serializes all roots marked as serializable
element.getHTML({shadowRoots: roots}); // C) serializes the specific `roots` provided
element.getHTML({includeShadowRoots: false, shadowRoots: roots}); // D) throws an exception
The question is related to example C). Does providing shadowRoots
effectively set includeShadowRoots
to true or false? As I wrote the code, this behavior felt like it would be weird and surprising:
includeShadowRoots
is clearly false
, from A).shadowRoots
(as in C) silently sets includeShadowRoots
to true
which will result in additional (non-specified) roots being serialized.From a developer point of view, the least surprising behavior would seem to be that the default for includeShadowRoots
is false, and providing shadowRoots
doesn't change that. So in C), only the provided roots
are serialized, and not others that are marked as serializable
. To get those, use element.getHTML({includeShadowRoots: true, shadowRoots: roots})
. The only weird inconsistency would then be D) which throws an exception if includeShadowRoots
is explicitly set to false while also providing shadowRoots
. That makes includeShadowRoots
kind of tri-state. That's fine with me, but I wanted to confirm.
I see, your assumption is that when you specify particular shadow roots (with shadowRoots
), you're not necessarily also interested in serializable shadow roots (includeShadowRoots
)? I feel ambivalent to that reframing, but I think we should rename includeShadowRoots
to serializableShadowRoots
if we go that way.
I see, your assumption is that when you specify particular shadow roots (with
shadowRoots
), you're not necessarily also interested in serializable shadow roots (includeShadowRoots
)? I feel ambivalent to that reframing, but I think we should renameincludeShadowRoots
toserializableShadowRoots
if we go that way.
I'm kind of ambivalent also. It just feels a little odd that getHTML()
returns none, and getHTML({shadowRoots: list})
returns list
plus others. Sounds like you're ok with my proposed behavior, as long as there's a rename? @smaug---- any thoughts?
I'm kind of ambivalent also. It just feels a little odd that
getHTML()
returns none, andgetHTML({shadowRoots: list})
returnslist
plus others. Sounds like you're ok with my proposed behavior, as long as there's a rename? @smaug---- any thoughts?
Alright, I've put up HTML and DOM spec PRs for this behavior. It's on the larger side of the spec PRs I've written, so please go easy on me. Suggestions for improvement are welcome.
One thing: having written it, with the rename to serializableShadowRoots
, it now feels weird to throw when serializableShadowRoots
is false and shadowRoots
is non-empty. It feels like perhaps this should be the behavior instead:
element.getHTML(); // A) does not serialize *any* shadow roots
element.getHTML({serializableShadowRoots: true}); // B) serializes all roots marked as serializable
element.getHTML({shadowRoots: roots}); // C) serializes the specific `roots` provided
element.getHTML({serializableShadowRoots: false, shadowRoots: roots}); // D) same as (C)
I.e. without (D), it'll be quite awkward to have to remove the serializableShadowRoots
member in order to just serialize specific roots. I also don't see the danger or need for an exception.
Yeah, that makes sense.
Thinking about this method more I have a concern. On the parsing side we get to choose what parser to use and therefore can enforce HTML. And therefore setHTML[Unsafe]()
are good names. However, I doubt that is the plan on the serialization side?
This will need some thought. If we just want to reuse the existing serialization algorithm maybe serialize()
is a reasonable name. Enforcing HTML serialization seems hard. (What if someone inserts a CDATASection, ProcessingInstruction, or non-HTML element?)
@zcorpan might have thoughts about this. Also curious to hear from @hsivonen and @smaug----.
I.e. without (D), it'll be quite awkward to have to remove the
serializableShadowRoots
member in order to just serialize specific roots. I also don't see the danger or need for an exception. Yeah, that makes sense.
Great, thanks, I've made that change to the PR.
Thinking about this method more I have a concern. On the parsing side we get to choose what parser to use and therefore can enforce HTML. And therefore setHTML[Unsafe]() are good names. However, I doubt that is the plan on the serialization side?
Well, in the PR I explicitly run the HTML fragment serialization algorithm, which should correctly serialize a contained foreign element, right? I suppose we could rename the method to serialize()
and instead call DOMParsing's fragment serializing algorithm. I kind of liked the parallelism to setHTML[Unsafe]()
here. But I don't have terribly strong opinions.
I'd assume this work similarly to innerHTML getter in HTML documents, if there are things like ProcessingInstructions
innerHTML
's getter has an "XML code path". Given that we explicitly opted to not give setHTML()
such a code path, it would be strange for getHTML()
to have one. And I think it would also be strange if it worked on non-HTML elements but didn't serialize things such as namespaces.
For what it's worth it seems totally fine to me, and in fact desirable, to have a method that specifically runs the HTML fragment serialization algorithm. There are plenty of non-serializable things possible in the DOM, no matter what serializer you're using. I don't think we should be worried about what happens if authors use old discouraged things like namespaces or processing instructions, in combination with this new method.
@annevk https://w3c.github.io/DOM-Parsing/#dom-innerhtml-innerhtml is exists on all elements, but switches on the HTMLness of the document, so any strangeness is not new. https://software.hixie.ch/utilities/js/live-dom-viewer/saved/12394
I agree that getHTML()
shouldn't have an XML code path, like @hsivonen said. Instead we can add getXML()
if fragment serialization as XML is needed.
innerHTML
's getter has an "XML code path". Given that we explicitly opted to not givesetHTML()
such a code path, it would be strange forgetHTML()
to have one. And I think it would also be strange if it worked on non-HTML elements but didn't serialize things such as namespaces.
Right - as I mentioned, the PR points getHTML()
to the HTML fragment serialization algorithm, so it'll be parallel to both setHTML()
and calling innerHTML
from within an HTML document.
Given the other comments, it sounds like maybe the PR is ok as-is then, modulo other issues?
Thanks everyone!
This issue is split out from the discussion that starts roughly here https://github.com/whatwg/html/pull/5465#discussion_r579593543. The original Declarative Shadow DOM spec PR included this functionality, which allowed a DOM sub-tree to be serialized in a way that doesn't throw away all of the contained shadow roots. To facilitate landing that PR, this functionality was removed. I'd like to use this issue to resume that discussion.
The problem statement is that DOM content containing developer shadow roots (i.e. not UA shadow roots) cannot be serialized. Doing so today requires a manual tree walk in JS land, manually grabbing shadow roots and serializing them into
<template shadowrootmode>
elements.The ideal solution would be for
const html = element.innerHTML;
to just start serializing shadow roots like it already serializes the rest of the DOM into HTML. However, this is almost surely not web compatible. One option is to provide an attribute onDocument
and/orShadowRoot
that opts-in to this serialization behavior. However, given the recent push to define new parser entrypoints that are functions instead of attributes, it seems to make more sense to also define a new functional getter that optionally (or always?) serializes shadow roots:To preserve the encapsulation of "closed" shadow roots, another option would be needed in the case that "closed" shadow roots want to be serialized:
Thoughts? Concerns?