whatwg / dom

DOM Standard
https://dom.spec.whatwg.org/
Other
1.58k stars 294 forks source link

Add NodeList.setAttributes #895

Open michaschwab opened 4 years ago

michaschwab commented 4 years ago

I propose the addition of a setAttributes function to the NodeList prototype to support bulk-setting attribute values on a collection of nodes: NodeList.prototype.setAttributes = function(this: NodeList, attribute: string, values: (string|number)[]) {}

Many popular JS frameworks, such as jQuery (link) and D3.js (link), come with such attr or similar functions that allow bulk-setting attribute values for many nodes at once. Internally, these functions have to loop over the selection of nodes and call the setAttribute function on individual nodes.

When hundreds of nodes are updated at once, the overhead from the repeated traversal from JS function calls down to browsers' internal attribute changes adds up to significant performance costs. See related crbug. With the addition of a setAttributes function, users and framework creators would not have to manually loop over the nodes, simplifying the implementation, while browser creators could optimize the implementation and performance for bulk-setting attributes.

mfreed7 commented 4 years ago

@michaschwab can you elaborate on how often you think this pattern gets used on the web? Are there other, similar, "bulk" operations that we should also consider? There would seem to be considerable perf improvements that could be made for these use cases, by removing (N-1) round trips through the bindings layer. So if there is some evidence that there are common use cases like this, Chromium would be supportive of this concept.

developit commented 4 years ago

I've noted it elsewhere, but I wanted to echo here for visibility: in order to be broadly useful, a "bulk set" feature should support properties rather than (or in addition to) attributes. It also should support bulk operations on a single node, which is extremely common and the primary bottleneck in most template and Virtual DOM implementations.

I have an proposal explainer here that tries to suggest an approach for addressing multiple property/attribute updates on a single Node. While a minimal implementation could ignore the suggested unification aspect, it would be immensely valuable addition to the DOM API since all current frameworks are required to either normalize internally or delegate that responsibility to the developer.

annevk commented 4 years ago

@developit invoking a number of setters is a very different operation and I'd suggest we tackle that idea in a separate issue.

developit commented 4 years ago

Sounds reasonable. Before I create an issue, I wanted to double check: if the goal is multi-set performance primitive, maybe both cases could be covered by a separate lock/commit style API? Something like requestAtomicUpdate(fn, [...affectedNodes])?

annevk commented 4 years ago

Maybe. If you want to avoid hitting the binding layer you probably do not want a callback though, but rather spell out the instructions to execute.

michaschwab commented 4 years ago

Thanks a lot for everyone's comments! @mfreed7 I'm excited that Chromium would be supportive of a "setAttributes" function if there is evidence that there are common use cases.

There are multiple pieces of evidence here. If we're interested in seeing simply how widely spread this pattern is, we can look no further than "THE" JS library - jQuery: the attr function does exactly this: attr calls the access function with the individual setAttribute function as callback, which then loops over the elements to call setAttribute once for each element. Since jQuery is basically everywhere, and attr() is a core function of it, I would say the pattern is pretty widely spread.

If we're interested in seeing that it's a pattern that is often used on a large number of elements, D3.js might be the best example. While D3.js is not as widespread as jQuery, it's still the primary framework used on the web for interactive graphics and is used by major news organizations such as NYT, and lots of other sources of graphics on the web. For D3.js, data binding is the key functionality (with enter, update and exit), and is used to represent many data points as many graphical elements. Every single example in the D3.js Gallery that shows data (so basically everything except bare-bones maps) uses d3.attr() to update all of its nodes. E.g. the force graph example updates all node and link positions in each frame using the attr function:

function ticked() {
    node.attr("cx", d => d.x)
        .attr("cy", d => d.y);

    link.attr("x1", d => d.source.x)
        .attr("y1", d => d.source.y)
        .attr("x2", d => d.target.x)
        .attr("y2", d => d.target.y);
}

In the source code of D3.js' attr function, you can see that similar to jQuery, it loops over all elements with its ".each" function, and passes the attrFunction, which is just a wrapper for element.setAttribute.

To summarize: Looping over elements to call setAttribute on each of them is extremely widespread as it is used in most JS libraries such as jQuery and D3.js. For D3.js, this pattern is perhaps the very core of the library itself, built around the enter-update-exit pattern that defines D3.js, and is often used to update hundreds of nodes in each frame, often resulting in multiple thousand .setAttribute calls in each frame due to multiple attributes.

Is this the kind of evidence you are looking for? If you have questions, I can try to find more specific evidence.

mfreed7 commented 4 years ago

Is this the kind of evidence you are looking for? If you have questions, I can try to find more specific evidence.

@michaschwab thanks for the terrific examples of how this might be used, and might be helpful. I do see the need here, and it does seem like a very common pattern on the web and in common libraries. The comments from @developit back this up. I guess I'm left with these questions:

  1. How much faster might a native "bulk" API be, as compared to the current practice of one-at-a-time? This would likely require some prototyping to measure. There would clearly be some benefit, as the round-trips through the JS bindings layer would be eliminated. But these are already highly optimized, so it's unclear how much time would be saved.
  2. Is this a single API that supports:
    • multiple elements?
    • properties?
    • attributes?
  3. Or are there separate APIs for all of the above?
  4. Is there some sort of ordering of assignments that is preserved? ... and most importantly ...
    1. How is this different than, or perhaps, could this be implemented using, the "Template Instantiation" proposal?

For #5, it seems like that would depend on how the APIs are typically used. If it is for just a single assignment, e.g. after creating a bunch of elements, then Template Instantiation likely wouldn't help. This is perhaps part of your jquery example. But for your D3.js example, that definitely seems right in the wheelhouse of Template Instantiation.

@yuzhe-han @josepharhar

yuzhe-han commented 4 years ago

@michaschwab These are great use cases for the batch commit API in template parts proposal.

One can associate attribute and property parts with a DOM tree, update their values, and batch commit them to the DOM. In addition, once parts are attached, they can be repeatedly updated and committed. This avoids the multiple round trips between JS and browser internals for each set.

LeaVerou commented 4 years ago

I think a separate static method would be better, as it can then operate on arrays of elements, not just nodelists. Many libraries and helper methods today return arrays for operations, and AFAIK there is no way to convert an array to a NodeList, since it is not constructible. A static method would also allow handling of both single elements and collections of elements in one fell swoop.

Many bulk operations are commonly needed:

Performance is only one aspect, readability and maintainability is far bigger, which is why this pattern is so popular and has been since 2006 (jQuery popularized it I believe) despite the lack of any performance improvement.

Consider this:

let audio = document.createElement("audio");
audio.setAttribute("src", "rain.mp3");
audio.setAttribute("autoplay", "");
audio.setAttribute("start", "00:00:00.00");
audio.setAttribute("loopstart", "00:00:00");
audio.setAttribute("end", "00:01:00");
audio.setAttribute("playcount", "100");
audio.setAttribute("controls", "");

and this:

let audio = document.createElement("audio");
Element.setAttributes(audio, {
    src: "rain.mp3",
    autoplay: "",
    start: "00:00:00.00",
    loopstart: "00:00:00",
    end: "00:01:00",
    playcount: "100",
    controls: ""
});

or even:

let audio = document.createElement("audio", {
    attributes: {
        src: "rain.mp3",
        autoplay: "",
        start: "00:00:00.00",
        loopstart: "00:00:00",
        end: "00:01:00",
        playcount: "100",
        controls: ""
    }
});

Which one is easier to read and maintain?

michaschwab commented 4 years ago

Thanks for everyone's contributions to this discussion!

How is this different than, or perhaps, could this be implemented using, the "Template Instantiation" proposal?

@mfreed7 @yuzhe-han Thanks, I hadn't seen that before! The template proposal looks very promising and useful. I think many optimizations for batch changes can and should be made with this approach, and so I agree that there is some common functionality. But the proposal assumes that developers switch to this templating approach of manipulating the DOM (via the "Part" and "Part" objects), which in my view is independent from this proposal. I don't think the ability to use batched changes should require developers to interact with the DOM via the templating and "Part" approach.

How much faster might a native "bulk" API be, as compared to the current practice of one-at-a-time? This would likely require some prototyping to measure. There would clearly be some benefit, as the round-trips through the JS bindings layer would be eliminated. But these are already highly optimized, so it's unclear how much time would be saved.

I don't know the answer to this question. I know that the loop over setAttribute is often the majority of the render time in each frame of a data visualization (often over 20ms in real world applications that update only a handful of attributes, but on thousands of nodes), and my intuition is that a low level loop setting the attribute values within the browser would not take that long. But I agree that this should be prototyped. I haven't worked on Chrome directly, so if you want me to do this prototyping, it will take me some time to ramp up. I'm willing to do this work, but am quite busy until the Christmas break. I'm also joining Google again for full time in January, so I will have more time for this work then.

Is this a single API that supports: multiple elements? properties? attributes? Or are there separate APIs for all of the above?

What I'm proposing is that the API supports multiple elements. In my view, this is clearly the major bottleneck: there is always a finite set of attributes that you are setting, but the number of elements is unbounded. I understand if you want to provide a more systematic solution that encompasses other requests, and I agree with @LeaVerou that the readability can improve if we could manipulate multiple attributes. But it's not the focus of this proposal.

Is there some sort of ordering of assignments that is preserved?

No, I'd say the order is irrelevant.

I also like @LeaVerou's comment on the benefits of the static method. Element.setAttributes(elements, attributeValues) would be wonderful.

yuzhe-han commented 3 years ago

@michaschwab, thanks for your feedback on the template parts proposal. I see your point about the differences between that proposal and what this issue is tackling.

I see two main pain points this proposal addresses, performance and ergonomics. Regarding performance, I'm not sure how much improvement can be achieved.

often over 20ms in real world applications that update only a handful of attributes, but on thousands of nodes

The example in a comment, https://github.com/whatwg/dom/issues/895#issuecomment-714582150, shows that each attribute's value is different for each node. So unless the API supports passing-in a set of attributes for each node, the script will still have to loop over each node, and bulk set its attributes. The performance benefit will be not needing to loop over each attribute for a given node.

Is this analysis correct, or have I misunderstood the API shape?

As for ergonomics, I like @LeaVerou's and your comment about a static method. Converting a Nodelist to an Array and vise versa is not straight forward. The static method solves that.

michaschwab commented 3 years ago

Thanks for following up!

The example in a comment, #895 (comment), shows that each attribute's value is different for each node. So unless the API supports passing-in a set of attributes for each node, the script will still have to loop over each node, and bulk set its attributes. The performance benefit will be not needing to loop over each attribute for a given node.

Is this analysis correct, or have I misunderstood the API shape?

I think you might have misunderstood slightly. Yes, the value is different for each element. But I propose that the API allows passing in a set of attribute values so that the JS no longer has to loop over each node for DOM access. Currently, the D3.js attr function has to loop over all the nodes, e.g. like this:

selection.attr = function(attrName, attrValFct) {
    for(const element of selection.elements) {
        const value = attrValFct(element['__data__']); // Fast: No DOM.
        element.setAttribute(attrName, value); // Slow: Traverses the bindings layer for each element.
    }
}

With the proposed change, the code could be updated to look like this:

selection.attr = function(attrName, attrValFct) {
    const values = selection.elements.map(element => attrValFct(element['__data__'])); // Fast: No DOM.
    const attributes = {[attrName]: values};
    Element.setAttributes(selection.elements, attributes); // Faster: Only traverses the bindings layer once.
}

The JS would still loop over the elements to get the different values. But the JS would no longer have to access the DOM, and traverse the JS bindings layer, for each element, which is the slow part.

michaschwab commented 3 years ago

To clarify further what I'm proposing after this discussion:

Element.setAttributes = function(elements: HTMLElement[], attributeValues: {[attrName: string]: (string|number)[]})

This allows setting a set of attribute values to a set of elements, and also allows the combined setting of multiple attributes.

LeaVerou commented 3 years ago

To clarify further what I'm proposing after this discussion:

Element.setAttributes = function(elements: HTMLElement[], attributeValues: {[attrName: string]: (string|number)[]})

This allows setting a set of attribute values to a set of elements, and also allows the combined setting of multiple attributes.

The function should accept both singular nodes and arrays. We shouldn't have to pointlessly wrap a single element with an array to set multiple attributes on it.

Similarly, we shouldn't need to wrap a single attribute-value pair with an object literal just to set it on multiple elements.

jQuery did this very well: You could provide the name and value as separate arguments, or as one object literal, with many pairs. You could also operate on 0, 1, or multiple elements.

As for values that differ per element, one solution would be to allow callbacks as the values with arguments the same as Array#map(). There is precedent in JS: String#replace() behaves differently with a string argument vs a function argument. This is also something that jQuery’s attr() method allowed.

yuzhe-han commented 3 years ago

Thanks for the clarification! I see three variations of bulk setting attributes API.

  1. Method for setting multiple attributes for a single element. Example in Lea's comment.

    interface Element {
    [CEReactions] undefined setAttributes( DOMStringMap attributes ); 
    }
  2. Method for setting the same set of attributes for a list of elements. It builds on top of 1 where the method loops internally and invokes setAttributes() for each element. I don't see any precedence for defining static methods in the DOM spec. Maybe it should become its own object, ElementSetter | BulkSetter ?

    interface ElementSetter {
    undefined setAttributes(sequence<Element>, DOMStringMap attributes);
    }
  3. Method for setting one attribute for a list of elements where the values are unique to each element in the list.

    interface ElementSetter {
    undefined setAttributes(sequence<Element>, DOMString qualifiedName,  sequence<DOMString> values);
    }

These APIs should simplify the implementations of jQuery (link) and D3.js (link) and improve performance by eliminating the needs to access DOM inside a loop.

@michaschwab @LeaVerou @developit Do these APIs meets your needs? At least for setting attributes?

@annevk mentioned above that invoking multiple setter should be tackled in a separate issue. Was that in reference to setting both attributes and properties? Do you have any suggestions on where to define these APIs, for 2 and 3?

annevk commented 3 years ago

That was in reference to running setters as opposed to setting attributes. It could probably still be defined in DOM. I'm not sure how that relates to 2/3 in your comment. (It's fine to add statics btw.)

LeaVerou commented 3 years ago

Thanks for the clarification! I see three variations of bulk setting attributes API.

I would prefer polymorphism over a different function for each use case with slightly different signatures that need to be learned separately. But I'm aware that often the preference has been towards multiple functions for performance reasons, so that's not a strong objection. Having a new ElementSetter interface just for this, when these methods would make perfect sense as static methods of existing classes seems like overfragmentation TBH.

  1. Method for setting one attribute for a list of elements where the values are unique to each element in the list.
interface ElementSetter {
  undefined setAttributes(sequence<Element>, DOMString qualifiedName,  sequence<DOMString> values);
}

These APIs should simplify the implementations of jQuery (link) and D3.js (link) and improve performance by eliminating the needs to access DOM inside a loop.

If I correctly understand what you're proposing, the ergonomics of this seem suboptimal. In general try to avoid disconnected sequences that the developer needs to ensure match up. It creates unnecessary error conditions (what happens when the sequences don't have the same length?), and is awkward to specify literally. Also, there are several cases that are not covered:

I wonder if it would be feasible to study usage of jQuery's attr() function in the wild and see how developers actually use it. It might be feasible with a custom HTTPArchive metric, though it would take a month to get results. I could look into it.