whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.16k stars 2.69k forks source link

Should HTML parsers created for fragment parsing (i.e. innerHTML) or DOMParser spin the event loop in "stop parsing"? #8646

Open Lubrsi opened 1 year ago

Lubrsi commented 1 year ago

When the HTML parser stops parsing, it performs one conditional event loop spin on step 5.1 and two unconditional event loop spins on step 7 and 8. Spinning the event loop performs a microtask checkpoint on step 5.

However, this causes microtasks to execute at the wrong time. Take this example with custom elements:

<!DOCTYPE html>
<html>
<body>
    <custom-element>
    </custom-element>
    <script>
        class CustomElement extends HTMLElement {
            constructor() {
                super();

                // Remember that we are here whilst being immediately called from the upgrade reaction with the connectedCallback reaction still on the microtask queue.

                // Setting innerHTML or using DOMParser creates a new HTML parser. When it finishes parsing, it will unconditionally spin the event loop until all async scripts have executed.
                // However, this unconditional event loop spin causes a microtask checkpoint to occur.
                // In so doing, we invoke the microtask to invoke connectedCallback below _before_ we set `superImportantAttributeINeed` below.
                // Now jump to the comment in connectedCallback below.
                let html = "<div>Hello world</div>";
                if (Math.random() > 0.5) {
                    console.log("Setting innerHTML");
                    let element = document.createElement("div");
                    element.innerHTML = html;
                } else {
                    console.log("Using DOMParser");
                    new DOMParser().parseFromString(html, "text/html");
                }

                this.superImportantAttributeINeed = { superImportantValueINeed: 1337 };
            }

            connectedCallback() {
                // Since setting innerHTML or using DOMParser incorrectly caused a microtask checkpoint via spinning the event loop, we are here before `superImportantAttributeINeed` was set.
                // Since `superImportantAttributeINeed` is undefined, this throws with `undefined cannot be converted to an object`, when it should continue and print:
                // "Here's the super important value I needed: 1337"
                console.log("Here's the super important value I needed:", this.superImportantAttributeINeed.superImportantValueINeed);
            }
        }

        // When defining a custom element, it will (simplified for this case):
        // - Scan the document's DOM for any existing elements with the passed in custom element name.
        // - Find the existing `custom-element` element above.
        // - Enqueue an upgrade reaction microtask to upgrade that existing element to a custom element.
        // - Finished.
        customElements.define("custom-element", CustomElement);

        // After this script block ends, a microtask checkpoint happens, which causes the upgrade reaction to run.
        // In order, the upgrade will (simplified for this case):
        // - See the existing `custom-element` is connected and enqueue a microtask to call connectedCallback in CustomElement.
        // - Immediately (with no (micro)tasks) call the constructor in CustomElement.
        // - Finished.
        // Now jump to the comments in the constructor above.
    </script>
</body>
</html>

This affects YouTube, which sets innerHTML of a <template> element in some custom element constructors before setting up class attributes, causing unhandled exceptions in connectedCallback when it tries to use the attributes it assumes were setup in the constructor.

It appears WebKit and Blink skip most of the end for parsing fragments, roughly doing steps 2 and 4. It doesn't appear they skip any steps for DOMParser, as they don't use createFragmentForInnerOuterHTML or createFragmentForMarkup for DOMParser.

WebKit: https://github.com/WebKit/WebKit/blob/44374328f96d4c74b57ce233a54d72406492774b/Source/WebCore/html/parser/HTMLDocumentParser.cpp#L140-L142

https://github.com/WebKit/WebKit/blob/46837e56d6f57f7e52bf92c9534cc7683909eee5/Source/WebCore/html/parser/HTMLTreeBuilder.cpp#L3024

https://github.com/WebKit/WebKit/blob/a36f6efecc490ae267d0a2103e86f1f0db533c62/Source/WebCore/xml/DOMParser.cpp#L41

Blink: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/html/parser/html_document_parser.cc;l=553-555;drc=f9ebf0deab3efad7c0db6843af187bb36f0ccb4c?q=html_document_parser&ss=chromium%2Fchromium%2Fsrc

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/html/parser/html_tree_builder.cc;l=2950;drc=f9ebf0deab3efad7c0db6843af187bb36f0ccb4c?q=html_tree_bui&ss=chromium%2Fchromium%2Fsrc

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/xml/dom_parser.cc;l=32;drc=f9ebf0deab3efad7c0db6843af187bb36f0ccb4c?q=dom_parser.cc&ss=chromium%2Fchromium%2Fsrc

I have not looked into what Gecko does for this.

There are other issues with the end in these cases, such as unnecessary events like readystatechange, DOMContentLoaded and load which are not observable and the WebDriver steps assuming Document's browsing context is non-null. However, I focused on spinning the event loop as it is observable by user code.

bathos commented 1 year ago

Does this mean DOMParser currently permits forcing arbitrary queued microtasks to be flushed “synchronously” (from the POV of the flush-inducing code) at will? I’d have expected (naively, anyway — I may not be picturing this right) the microtask checkpoint algorithm to be a noop any time user code is already on the execution stack.

Kaiido commented 1 year ago

Just wondering, is it possible that there is something "that delays the load event" here? If I'm not mistaken, scripting is disabled by lack of a browsing context, so I guess the two first lists of scripts from step 5 and step 7 should always already be empty, but I'm not quite sure about this step 8, though it seems that most requests are also blocked because the Document is not "fully active".

Asking because this reminds me of https://github.com/whatwg/html/issues/6230 where it was already spotted that browsers apparently don't really spin the event loop when the condition is already met. I guess the same could be happening here (assuming in your tests you don't have the microtask-checkpoint). Note: I agree that a strict reading of the specs indeed would ask for it to be visited, and incur a microtask-checkpoint, and it also seems true that most of this "the end" seems off for these parsers.