whatwg / dom

DOM Standard
https://dom.spec.whatwg.org/
Other
1.56k stars 289 forks source link

Proposal: DOMChangeList #270

Open wycats opened 8 years ago

wycats commented 8 years ago

DOMChangeList and DOMTreeConstruction

Motivation

There are three high-level motivations for this API:

  1. To make it clearer and less error prone to apply a sequence of DOM operations at once, that can support the full gamut of mutations to Elements and Nodes.
  2. For efficiency, to provide an API for constructing a sequence of DOM operations that can be:
    1. constructed in a worker and transferred to the UI thread
    2. constructed with a minimum of allocations
    3. applied in one shot without interleaved user code
  3. To support, in one API, the superset of trees that can be produced using the HTML parser and using the DOM API (the HTML parser supports more tag names, while the DOM API supports more trees, such as custom elements nested inside a table).

This is an initial, minimal version of this API, which could be expanded over time with more capabilities. It should always maintain its goals of predictable, allocation-lite performance, focusing on giving user-space abstractions the capabilities they need for maximum performance.

Concepts

NodeToken

A NodeToken is an opaque value that represents a Node. It can be efficiently transferred from one worker to another.

It serves two purposes:

The intent of this API is to have these performance characteristics:

If there is some reason that a concrete implementation of this design might not be able to accomplish these goals, it's almost certainly something we should discuss.

Details

DOMTreeConstruction

Note: Use of TypeScript generics notation in the details of this proposal does not intend to imply any additional type checking. It is simply a way to annotate which kinds of node a particular node tokens is intended to produce for clarity. This proposal expects checking at the point of application (e.g. appending a node to a non-element node doesn't make sense and would be an error at the time of application). At present, this proposal does not assume that errors in the change list cause previously applied operations to be rolled back; that is listed as an open question below.

// you can create a NodeToken in the UI thread and transfer it into
// a worker that constructs the DOMChangeList or DOMTreeConstruction.
partial interface Node {
  getToken(): NodeToken<this>;  
}

interface Bounds {
  firstNode: NodeToken<Node>;
  lastNode: NodeToken<Node>;
}

partial interface Element {
  // this has the same semantics as insertBefore, but takes a `DOMTreeConstruction`
  insertTreeBefore(element: Element, tree: DOMTreeConstruction, reference: Node): Promise<AppliedChanges>;
}

// The intent of this object is to allow engines to clean up any bookkeeping maps they
// need to track `NodeToken`s once the `AppliedChanges` is GCed.
interface AppliedChanges {
  // this allows code that created a ChangeList to get the actual
  // node they were creating so they can add listeners or change
  // the node later.
  getNode<T extends Node>(t: NodeToken<T>): T;
}

// DOMTreeConstruction is transferrable, and is backed by raw bytes. This
// allows trees to be constructed in workers.
interface DOMTreeConstruction {
  openElement(name: string, ns: string = "html"): NodeToken<Element>;
  closeElement();

  setAttribute(name: string, value: string = "", ns: string = ""): void;
  appendText(text: string): NodeToken<Text>;
  appendComment(text: string): NodeToken<Comment>;
  appendHTML(html: string): Bounds;
}

DOMChangeList

partial interface Document {
  applyChanges(changes: DOMChangeList): Promise<AppliedChanges>;
}

// DOMChangeList is transferrable, and is backed by raw bytes. This
// allows change lists to be constructed in workers.
interface DOMChangeList {
  /// traverse
  nextSibling(n: NodeToken<Node>): NodeToken<Node | null>;
  previousSibling(n: NodeToken<Node>): NodeToken<Node | null>;
  firstChild(n: NodeToken<Node>): NodeToken<Node | null>;
  lastChild(n: NodeToken<Node>): NodeToken<Node | null>;
  parentNode(n: NodeToken<Node>): NodeToken<Element | null>;

  /// create
  createElement(name: string, ns: string = 'html'): NodeToken<Element>;
  createText(text: string): NodeToken<Text>;
  createComment(text: string): NodeToken<Comment>;

  /// update
  setAttribute(el: NodeToken<Element>, name: string, value: string = '', ns: string = ''): void;
  updateTextData(n: NodeToken<Text>, text: string): void;
  updateCommentData(n: NodeToken<Comment>, text: string): void;
  remove(n: NodeToken<Node>): void;

  /// append
  appendChild(el: NodeToken<Element>, node: NodeToken<Node>): void;
  appendTree(el: NodeToken<Element>, tree: DOMTreeConstruction): void;

  // this unifies the various HTML-based APIs in the DOM; the return value is
  // a token corresponding to the start and a token corresponding to the end of
  // the inserted tree which can be reified after application.
  //
  // IMPORTANT NOTE: The HTML is parsed during application, not when
  // it is added to the change list.
  insertHTMLBefore(node: NodeToken<Node>, html: string): [NodeToken, NodeToken];
}

Example

Using the usual DOM API:

let article = document.getElementById('main-article');
let text = node.firstChild;
text.textValue = "Hello world";
node.setAttribute("data-active", "yes");
node.removeAttribute("data-pending");

Using the ChangeList API:

let articleToken = document.getToken(document.getElementById('main-article'));

let changes = new DOMChangeList();
let textToken = changes.firstChild(token);
changes.updateTextData(textToken, "Hello world");
changes.setAttribute(articleToken, "data-active", "yes");
changes.removeAttribute(articleToken, "data-pending");

document.applyChanges(changes);

FAQ

Is this actually faster?

  1. Don't you still have to create all of the DOM nodes anyway? If so, why is it cheaper?
  2. Isn't DOM pretty fast in modern browsers already?

The intent of this API is to create a low-level interface that is as close as possible to the underlying implementations. It attempts to avoid introducing new costs while reducing a number of paper-cuts that exist in today's usage.

  1. This API creates DOM nodes in the engine, but it does not need to create JavaScript wrappers. Experiments with deep cloneNode show that skipping those wrappers provides a performance benefit, but cloneNode() can't satisfy as many use-cases as this API.
  2. It allows the construction of the set of mutations to occur separately from the application (or even in a worker), keeping the sensitive work that limits 60fps to a minimum.
  3. Since applying changes is asynchronous, the full change list can be applied in batches that avoid blocking interaction (especially scroll). If the browser reaches its budget, it can interleave some work to keep the UI interactive and pick up the mutation process afterward. In short, the API should allow browsers to experiment with more scheduling strategies.
  4. It encourages good staging practices, eliminating some of the major causes of layout thrash (see the next section).

    Isn't the real issue that people are interleaving DOM manipulation and layout?

That is certainly a major issue, and this API puts developers on the path to success by encouraging them to stage DOM manipulation work separately from APIs that can trigger painting or layout.

Because the API guarantees that no user script can interleave during the application of changes, there is no way to "mess up" and trigger an immediate flush of any deferred work.

Unresolved Questions

  1. The current state of this API allows failures to occur during the processing of a change list, and does not require engines to roll back earlier changes (rolling back changes like "remove an iframe" may not be trivial to implement). Would engines prefer to roll back changes?
  2. Should we support APIs like ClassList and the style property through this API? It may be difficult to represent these kinds of changes with the operations already proposed (since this API does not allow direct imperative access to the DOM), and a few additional APIs probably wouldn't do damage to the constraints.
  3. Are there other optimizations that could be performed by engines? For example, Luke Wagner suggested that a parameterized version of this API could work like "prepared statements" in SQL, allowing engines to do up-front work to optimize the access and mutation patterns for application. What can we do to make this API more hospitable to hypothetical optimizations like those?
KrisSiegel commented 8 years ago

This introduces quite a few concepts for manipulating the DOM for something that's ultimately an optimization. This seems like an optimization that, with slightly an improved DOM API could accomplish the exact same thing with only minor additions to the API (perhaps ~a suspend and resume layout plus~ a way to mix elements so you can generate them external to the current thread and have the rendering engine apply them at once similar to your proposal but significantly less complex).

wycats commented 8 years ago

This introduces quite a few concepts for manipulating the DOM for something that's ultimately an optimization.

... perhaps a suspend and resume layout ...

What would the existing APIs that trigger a layout flush do while layout was paused? Would pausing layout force an immediate layout flush to update all layout caches? If not, how would they compute the "old" layout once mutations have begun?

One of my concerns with proposals to augment the existing DOM APIs with some sort of transaction is that you have to decide what happens to all of the DOM's APIs while the transaction is in flight. In contrast, the proposed API is a closed set of operations, disallowing any other interleaved operations, making it possible to understand the complexity of the entire API.

domenic commented 8 years ago

Great to see this! I'm coming at this in comparison to Dru's proposal at https://github.com/drufball/async-append/blob/master/EXPLAINER.md, so let me share some thoughts:

As far as I can tell this proposal focuses mostly on trying to minimize allocations (but see below) and on providing a proxy API that can be used in a worker, instead of focusing on allowing chunking and scheduling of parsing/style/layout/paint as Dru's proposal does. I can't tell at first glance whether it can serve both purposes, or whether its design prevents that. There's a brief paragraph under "Is this actually faster?" number 3 that indicates it is intended to be compatible, but I'll defer to the Blink engineers.

I think the proposal might have a misapprehension about how DOM nodes and wrappers are related. Remember that wrappers are not created until they are accessed. With this in mind it seems like the proposal just substitutes one set of wrapper allocations (NodeTokens and DOMTreeConstructions) for another (Nodes and DocumentFragments). Looking at the example code seems to bear this out; counting wrapper allocations it seems to have the same amount as normal DOM manipulation code. (And remember a wrapper is just a pointer; a wrapper to a NodeToken and a wrapper to a Node are both the same cost.) If you assume the DOM nodes will eventually exist anyway, as Yehuda discusses, Yehuda's proposal actually seems to have more allocations: NodeToken backing + NodeToken wrapper + Node backing, vs. Dru's proposal which has Node wrapper + Node backing.

One concern I have about the idea of running these things in a worker is that the IPC involved in getting back to the main thread (in the case of hundreds or thousands of nodes/nodetokens) would probably be a significant cost. I'm not sure whether it would be greater than or less than the cost of just doing the work on the main thread; similarly I am not sure whether the IPC would be significant enough to cause jank. But maybe all you have to IPC is a single memory location of the queue in which case it is presumably fine.

So, going through the three high-level motivations Yehuda lists:

  1. Both proposals make it clear and less error prone to apply a sequence of DOM operations at once, through similar queue-then-commit systems. Both support the full gamut of mutations to Elements and Nodes, with Dru's doing so by just allowing you to use those normal DOM APIs and Yehuda's by creating new versions of (most of) them that operate on analogous data structures.
  2. This bullet is weird since it doesn't list actual use cases, but instead specific ideas as to how good performance would be achieved, despite lack of data showing that these proposed solutions actually help. But anyway, going through each in turn:
    1. Yehuda's proposal allows the sequence of operations to be constructed in a worker and transferred; Dru's does not.
    2. Yehuda's proposal has slightly more allocations, as discussed aboved.
    3. Both have the queue-then-commit structure.
  3. Both support the union of the trees that can be produced using the HTML parser and using the DOM API, Dru's by using the HTML parser directly (innerHTML/outerHTML) and Yehuda's presumably by e.g. allowing openElement to take more element names than createElement does (although I couldn't find where this is specified). We could of course try to expand createElement again; nobody's had the motivation so far, but maybe this will be the last straw.

Also going through the "Applying a Change" bullets:

So it seems to me like the proposals are very similar with the main divergence being the desire to allow building the queue of mutations in a worker in Yehuda's proposal, which necessitates creating a parallel object hierarchy in order to build the queue over in the worker, before then transferring the queue to the main thread before (or during?) a commit.

To me it feels like an empirical question whether the gains from building the parallel DOM-like tree in the worker instead of building a DOM tree in the main thread outweigh the costs of transferring it. There's also the empirical question of whether building a DOM tree in the main thread actually takes long enough to cause jank; if it takes e.g. 1-2 ms for thousands of nodes, then it'd probably be best to just eat the cost and avoid the conceptual burden of introducing a parallel object hierarchy, or at least save it for a future extension once we have gotten rid of all the more low-hanging perf fruit and are grubbing for a few milliseconds.

KrisSiegel commented 8 years ago

What would the existing APIs that trigger a layout flush do while layout was paused? Would pausing layout force an immediate layout flush to update all layout caches? If not, how would they compute the "old" layout once mutations have begun?

No a pause wouldn't flush only a resume. This is how other frameworks operate. Regardless I struck the suspend / resume from my comment as it unnecessarily complicated my position.

One of my concerns with proposals to augment the existing DOM APIs with some sort of transaction is that you have to decide what happens to all of the DOM's APIs while the transaction is in flight. In contrast, the proposed API is a closed set of operations, disallowing any other interleaved operations, making it possible to understand the complexity of the entire API.

Transaction capability typically takes on a begin then rollback or commit pattern. This proposal doesn't do that. This proposal creates multiple new APIs to do what the existing DOM APIs can already do today except that it applies them in bulk when you're finished.

This proposal, in my opinion, could be significantly simplified by simply making it: document.appleChanges(source, destination);. Obviously you'd need to update how web workers work if you want to build a DOM tree in it but you'll have to do something similar anyway in your proposal for the Token pseudo elements.

The destination is the HTML node to apply all your changes to and the source is your separately built DOM. No reason the underlying engine couldn't handle all the complexity your introduced.

The DOM API is inelegant; I don't think it's a good idea to incorporate increased complexity for a theoretical optimization. If possible, re-use or work towards a better designed API that lends itself better to transactions. Transactions would be pretty cool.

annevk commented 8 years ago

@domenic most of the "allocations" in this proposal are not actually allocations. Integers are created which should be much more light-weight than actual objects and don't impact GC and such. (And also, I think it might be a good idea to define DOMChangeList and DOMTreeConstruction as ECMAScript-style objects. They really shouldn't perform much validation and the more browsers can inline the better.)

@KrisSiegel the API is not meant to be elegant. It's meant to be fast. Having discussed this with implementers it seems unlikely we'll get proper transactions. Doing that efficiently would be very hard and unlikely to materialize anytime soon.

KrisSiegel commented 8 years ago

the API is not meant to be elegant

@annevk I don't think the WHATWG should consider an API only for speed. The DOM APIs are public for web developers to use so usability of APIs shouldn't be forgotten or tossed away (this is how we ended up with jQuery and a new front end framework every two weeks). The DOM API is inelegant enough as is. I would like to see more re-usage of existing APIs where possible (which I see no reason why you couldn't do so while still accomplishing the goals of the proposal). I'm curious how efficient the passing of node objects to and from web workers would be. I could see that being useful in more scenarios than just this one.

Yeah I wouldn't be expecting transactions for any type of GUI state. It would be cool but yeah implementing it efficiently while maintaining generic capabilities is very difficult.

annevk commented 8 years ago

Transactions is pretty much impossible due to how <iframe> and such work. While we should make the DOM more elegant, we should also offer fast paths for advanced applications. Passing node objects around would not be efficient and getting node objects in workers is also something that's too hard (no implementation is anywhere near ready for that, if ever). This proposal, which avoids all that, is really our best bet to get something done in that area.

bfgeek commented 8 years ago

cc/ @drufball @esprehn @n8schloss

@domenic has summarized some of my thoughts up pretty well, but I'd just like to add my 2c. :)

This proposal has three parts that are potential performance benefits.

(1) Worker Construction of Tree. Having something that is transferrable sounds great. Moving more script logic off the critical path is something that authors have typically struggled with.

It may be worth thinking about this part of the problem as a separate API which integrates nicely with this proposal, and regular DOM APIs?

(2) Tree Construction We haven't seen in traces that we've performed that the bindings cost is the bottleneck here. For this part of the proposal we'd really love to see some data on this to be convinced that this is an issue worth solving.

@wycats , you mentioned some experiments with cloneNode you did? Do you have data on this?

It looks like that this is a pretty simple API to actually implement (if you are just worried about the bindings type cost); getting numbers from someone wouldn't be too hard?

If there is a significant performance difference: this should then be compared to a polyfill of the current DOM API on top of this API. I.e. when a sync DOM API is called, perform the mutation list synchronously.

Blink's implementation has the ability to create IDL APIs that live in javascript instead of cpp. If this is a huge performance win, then theoretically engines could change their implementations to mitigate this.

Again, we really need numbers to be convinced here.

(3) applyChanges It will soon be possible in blink for us to styleRecalc separately from layout and be able to "smear" work for a particular DOM mutation over multiple frames. This was the initial motivation behind asyncAppend; the fact that when you append DOM, it will typically jank that frame as more work is needed to be done than is possible in one frame.

We think it's really valuable for engines to be able to resolve applyChanges multiple frames after the initial request.

One thing that I think is missing in this API is the "DoItNow" button. I.e. calling this api would perform all the changes synchronously like the current DOM APIs.

As an example, say your UI can tolerate 100ms (~6frames) of the async DOM not appearing, but after that it would prefer to jank the main thread, as opposed to delaying further. A sync "commit" function would do this.

One way to do this with the current API would be:

changes = new DOMChangeList();
/* build up changes here */

changes.execute().then(() => {
  // changes aren't actually performed until you call "commit" here?
  // e.g. all the style/layout/etc work is performed. "commit" just makes it appear in the DOM tree.
  changes.commit();
});

requestAnimationFrame(() => {
  changes.commit(); // Promise above is now not resolved, as already done "commit".
});

We see really large benefits from this part. ^-^.

wycats commented 8 years ago

It looks like that this is a pretty simple API to actually implement

That was my intent. Getting many of the benefits people are after with a simple-to-implement API should be valuable.

We think it's really valuable for engines to be able to resolve applyChanges multiple frames after the initial request.

That is definitely the intent of this proposal. By making applyChanges async, engines can choose to smear the work across multiple frames, without janking user interactions like scroll. (making it async gives the engine's scheduler control over the timing.)

One thing that I think is missing in this API is the "DoItNow" button. I.e. calling this api would perform all the changes synchronously like the current DOM APIs.

Good request! I'd have no problem adding it if engines thought it wouldn't introduce new problems :smile:

We see really large benefits from this part. ^-^.

Great!

domenic commented 8 years ago

@annevk

most of the "allocations" in this proposal are not actually allocations. Integers are created which should be much more light-weight than actual objects and don't impact GC and such. (And also, I think it might be a good idea to define DOMChangeList and DOMTreeConstruction as ECMAScript-style objects. They really shouldn't perform much validation and the more browsers can inline the better.)

Hmm, can you expand on this? I didn't see any integers in this proposal. Going through the sample code:

// Usual DOM API:
let article = document.getElementById('main-article'); // Node wrapper (1)
let text = node.firstChild; // Node wrapper (2)
text.textValue = "Hello world"; // no allocations
node.setAttribute("data-active", "yes"); // no allocations
node.removeAttribute("data-pending"); // no allocations
// ChangeList API:

let articleToken = document.getToken(document.getElementById('main-article')); // Node wrapper + Token backing + Token wrapper (3)

let changes = new DOMChangeList(); // ChangeList backing + ChangeList wrapper (5)
let textToken = changes.firstChild(token); // Token backing + Token wrapper (7)
changes.updateTextData(textToken, "Hello world"); // no allocations
changes.setAttribute(articleToken, "data-active", "yes"); // no allocations
changes.removeAttribute(articleToken, "data-pending"); // no allocations

document.applyChanges(changes); // promise (8)

I see 2 allocations for normal DOM APIs (both simple wrapper allocations, which are "an integer" (pointer) size each) and 8 allocations for the ChangeList example. And that isn't including the transfer cost.

wycats commented 8 years ago

Great to see this! I'm coming at this in comparison to Dru's proposal at https://github.com/drufball/async-append/blob/master/EXPLAINER.md

I also reviewed that in detail before I began work on this proposal.

As far as I can tell this proposal focuses mostly on trying to minimize allocations

That is certainly an important characteristic of this proposal. It also has several other goals, both related to efficiency and expanding capabilities. (one example is supporting the superset of the HTML syntax and the DOM API, which I talk about in the proposal; another is allowing the end-developer to directly control the staging of work across threads by using workers).

(but see below)

I'll reply inline.

and on providing a proxy API that can be used in a worker

I don't think you should see ChangeList as a "proxy", but rather as a collection of instructions that can be freely transferred until it's applied.

instead of focusing on allowing chunking and scheduling of parsing/style/layout/paint as Dru's proposal does

Since the application of a ChangeList is asynchronous, the benefits to the browser's scheduling algorithm is more-or-less equivalent to Dru's proposal. ChangeList supports many more operations, and allows them to be performed in a batch, which should further increase the scope of the benefit to authors.

I can't tell at first glance whether it can serve both purposes, or whether its design prevents that.

This is something I spent a lot of time talking to @lukewagner, @smaug--- and @annevk about, and our intent was for this API to serve both purposes. As I said in the proposal: "If there is some reason that a concrete implementation of this design might not be able to accomplish these goals, it's almost certainly something we should discuss."

There's a brief paragraph under "Is this actually faster?" number 3 that indicates it is intended to be compatible, but I'll defer to the Blink engineers.

I'm very eager to hear from them about this. I'd be very happy to have a call with anyone interested in this area as well.

I think the proposal might have a misapprehension about how DOM nodes and wrappers are related. Remember that wrappers are not created until they are accessed.

This particular constraint was one of the most important aspects of this design, and I spent a lot of time thinking about it and discussing the details with others.

The intent of NodeToken is that it can be represented as a simple internal value (e.g. an representing the n'th object to be allocated in this transaction) but that the nature of that representation is not directly exposed to users. The only way in which these values can be reified into Nodes is through AppliedChanges. I've spoke with some engineers at Mozilla about how to structure the API to allow it to work as a simple value, and would love to discuss it further with engineers who work on Blink.

Looking at the example code seems to bear this out; counting wrapper allocations it seems to have the same amount as normal DOM manipulation code. (And remember a wrapper is just a pointer; a wrapper to a NodeToken and a wrapper to a Node are both the same cost.) If you assume the DOM nodes will eventually exist anyway, as Yehuda discusses, Yehuda's proposal actually seems to have more allocations: NodeToken backing + NodeToken wrapper + Node backing, vs. Dru's proposal which has Node wrapper + Node backing.

Both proposals make it clear and less error prone to apply a sequence of DOM operations at once, through similar queue-then-commit systems.

Both support the full gamut of mutations to Elements and Nodes, with Dru's doing so by just allowing you to use those normal DOM APIs and Yehuda's by creating new versions of (most of) them that operate on analogous data structures.

It looks like Dru's proposal creates several new async APIs for existing operations. It also postulates a "wrapper" that could be used with the regular APIs. I understood Dru's proposal as an early exploration with several different possible directions; can you flesh out which of those directions your comments are talking about?

Both support the union of the trees that can be produced using the HTML parser and using the DOM API, Dru's by using the HTML parser directly (innerHTML/outerHTML) and Yehuda's presumably by e.g. allowing openElement to take more element names than createElement does (although I couldn't find where this is specified). We could of course try to expand createElement again; nobody's had the motivation so far, but maybe this will be the last straw.

From my original statement about the union of trees: "the HTML parser supports more tag names, while the DOM API supports more trees, such as custom elements nested inside a table)."

DOMChangeList also supports using the HTML parser, but the HTML parser limits which nodes can be made children of certain elements. We could perhaps fix createElement to support the full set of element names supported by the HTML parser, and that would address this benefit of the ChangeList proposal.

To me it feels like an empirical question whether the gains from building the parallel DOM-like tree in the worker instead of building a DOM tree in the main thread outweigh the costs of transferring it. There's also the empirical question of whether building a DOM tree in the main thread actually takes long enough to cause jank;

I have heard this thinking before and I think it might misapprehend the reasons people want to do build DOM trees and calculate DOM mutations in a worker.

It's not so much shrinking the cost of the DOM work, but rather all of the JavaScript work that is necessary to calculate the needed mutations. In practice, a lot of application work is done on the UI thread simply because of the close proximity of the DOM APIs.

It is of course possible to implement some of this strategy as a user-space library, but not all of it. In user-space, applying a change set is inherently single-threaded, for example. The process of deserializing a transferred ChangeList would have to choose between interleaving the DOM operations or deserializing them into an intermediate data structure, both of which would introduce new costs that the native implementations would not need.

And that's all assuming that a user-space library has enough primitives to do an efficient job at the limit. The basic idea behind this API is to provide a low-level primitive that libraries can use to build up different useful libraries with different tradeoffs, and start lower level than the current set of primitives.


There are many small paper-cuts that the ChangeList proposal addresses in one, fairly contained proposal. It attempts to create an API that, when used, has noticeably fewer common gotchas for web developers, and does not require library, framework or application developers to reverse engineer which APIs produce hazards, and keep that knowledge up to date as implementations change.

I am especially interested in feedback from implementors that there is something about this proposal that is difficult to implement, or that would make reasonable initial implementations slower than the equivalent use of DOM APIs. Any suggested changes to this proposal that would improve

bfgeek commented 8 years ago

@wycats We really need numbers for point (2) - (bindings related costs are really the problem) before going too deeply into API design however.

Without numbers on if this will actually get us a significant performance benefit, it's impossible to tell if this is the right design or not.

I.e. we could get a simpler API if this wasn't a significant problem. For example just the asyncAppend APIs for (3) - (performing style/layout work async), and a "WorkerNode" API separately.

Are you able to get numbers for (2)? Or work with someone to get numbers for this?

domenic commented 8 years ago

The intent of NodeToken is that it can be represented as a simple internal value (e.g. an representing the n'th object to be allocated in this transaction) but that the nature of that representation is not directly exposed to users. The only way in which these values can be reified into Nodes is through AppliedChanges. I've spoke with some engineers at Mozilla about how to structure the API to allow it to work as a simple value, and would love to discuss it further with engineers who work on Blink.

Well, one issue with the above proposal is that it uses a type notation (TypeScript?) that implies type checking. This would necessitate creating Web IDL interfaces in order to get type checking so that you couldn't pass arbitrary values. It in fact uses a generics notation that implies we'd need (when converted to Web IDL) TextNodeToken, CommentNodeToken, ElementNodeToken, etc. So even if it's conceptually a simple internal value, it would need to be implemented as several wrappers + backing classes where the backing classes have those simple internal values.

It looks like Dru's proposal creates several new async APIs for existing operations. It also postulates a "wrapper" that could be used with the regular APIs. I understood Dru's proposal as an early exploration with several different possible directions; can you flesh out which of those directions your comments are talking about?

What I meant is that both of Dru's proposed APIs have minimal extra surface. For example they allow you to create Nodes and e.g. set their text data or attributes using the usual DOM APIs, whereas yours involves creating NodeTokens and using the parallel interface for such operations. In other words, instead of creating a parallel DOM API with a full parallel object hierarchy of NodeTokens and APIs that operate on them, Dru's proposal only duplicates the few APIs necessary for queue-building.

It's not so much shrinking the cost of the DOM work, but rather all of the JavaScript work that is necessary to calculate the needed mutations. In practice, a lot of application work is done on the UI thread simply because of the close proximity of the DOM APIs.

This is a great point; it's good to bring it up. It certainly makes the empirical question harder, but still worth investigating.

or that would make reasonable initial implementations slower than the equivalent use of DOM APIs

I've given some fairly concrete feedback about the increased allocation cost, which hopefully can make this clear. I'll defer to those more involved on the implementation side to say more.

wycats commented 8 years ago

Well, one issue with the above proposal is that it uses a type notation (TypeScript?) that implies type checking. This would necessitate creating Web IDL interfaces in order to get type checking so that you couldn't pass arbitrary values. It in fact uses a generics notation that implies we'd need (when converted to Web IDL) TextNodeToken, CommentNodeToken, ElementNodeToken, etc. So even if it's conceptually a simple internal value, it would need to be implemented as several wrappers + backing classes where the backing classes have those simple internal values.

The intent of the TypeScript notation was just to show clearly what kind of NodeToken I expect each part of the API to produce. The only time I expect any kind of checking to be mandatory is when applying the ChangeList, because for example it would be a bug to appendChild a Node to another Node or no node at all.

What I meant is that both of Dru's proposed APIs have minimal extra surface. For example they allow you to create Nodes and e.g. set their text data or attributes using the usual DOM APIs, whereas yours involves creating NodeTokens and using the parallel interface for such operations. In other words, instead of creating a parallel DOM API with a full parallel object hierarchy of NodeTokens and APIs that operate on them, Dru's proposal only duplicates the few APIs necessary for queue-building.

Thanks for the clarification. One of the problems I'm hoping to address is that not all operations are safe or reasonable to use inside a transaction, and certainly many operations are not safe if you want the resulting change list to be transferrable. By starting with a minimal API that nontheless covers mutations of the DOM as a data structure, we can build up an API that doesn't have surprising or problematic behavior.

I don't expect to need a much larger set of lowest-level APIs; Much of DOM1-4 is described in terms of a lower-level set of primitive DOM operations, and my intent is to flesh out the base layer of those APIs in a way that would both work across workers and support asynchronous application of the change list, smart scheduling by the engines, and avoid user-space interleaving of dangerous operations.

wycats commented 8 years ago

I.e. we could get a simpler API if this wasn't a significant problem. For example just the asyncAppend APIs for (3) - (performing style/layout work async), and a "WorkerNode" API separately. Are you able to get numbers for (2)? Or work with someone to get numbers for this?

I've spoken with Bill McCloskey at Mozilla, who has an intern looking into building an initial implementation of this proposal. Since I don't work at Mozilla, I obviously can't hold them to this, but hopefully we'll see some useful results from their work. But still, it doesn't seem like it should be necessary to implement a feature to discuss general performance implications of the design. Generally when we find ourselves getting close to a broad-strokes design that people find promising, that's when implementation work starts and we start getting more fine-grained performance feedback.

@wycats , you mentioned some experiments with cloneNode you did? Do you have data on this?

I can give more information later, but the TLDR is:

Instead of building up DOM and interleaving dynamic work, Ember has experimented with (and shipped, as part of Glimmer 1) an optimization that uses cloneNode for templates and fills in the dynamic content by doing an optimized walk to the dynamic nodes.

The entire benefit of this optimization is to avoid reifying static nodes in JavaScript, and we saw very significant improvements to templates with minimal dynamic content (and therefore shipped the optimization). We still need to walk through some static nodes to reach the dynamic ones, and the AppliedChanges API is intended to limit the number of reified nodes to precisely the dynamic ones that may need to be changed.

bfgeek commented 8 years ago

But still, it doesn't seem like it should be necessary to implement a feature to discuss general performance implications of the design.

Sorry have to disagree with you here. For most APIs proposing a broad-strokes design first is great. However when explicitly proposing a performance API, it should come with the burden of proof (or be obvious) that it will solve the performance problem.

Again if bindings isn't really the problem, we could get away with a simpler asyncAppend + WorkerNode design potentially, which is a significantly different API surface.

I can give more information later

Yes please!

wycats commented 8 years ago

Again if bindings isn't really the problem, we could get away with a simpler asyncAppend + WorkerNode design potentially, which is a significantly different API surface.

Something like this has been mentioned in very broad strokes a few times. I would be very interested to see what such an API would look like in more detail.

wycats commented 8 years ago

However when explicitly proposing a performance API, it should come with the burden of proof (or be obvious) that it will solve the performance problem.

I generally agree with the principle that there should be some reason to believe that the API will improve performance. In this case, there are multiple good reasons to believe that, in addition to the reduced cost of bindings:

These are all "paper cuts", but they're precisely the kind of paper-cuts that we see adding up in real applications, and have a lot of difficulty avoiding at scale in practice (indeed, I learned about the impact of many of them by watching great talks by Google engineers at Google I/O!).

For what it's worth, Ember applications (for example, LinkedIn's Mobile app, which is where I've been focusing much of my performance effort lately) can easily perform thousands or tens of thousands of discrete operations during initial render, so small costs (especially associated with garbage collection) can quickly add up.

esprehn commented 8 years ago

Wouldn't custom element callbacks run inside the operation making all the intermediate states visible? I don't think we can skip running the constructors or attached callbacks and do a layout, most elements are mutating their state in there, we'd be doing a wasted layout. This API doesn't handle the creation of ShadowRoots either.

I've been profiling many JS applications recently and the wrapper creation is rarely the place where costs add up, certainly not enough to justify this style of API. Specifically today I looked at Amazon, Facebook, Polymer iron-list, and several AMP pages. None of them would see much of a performance improvement from this API.

Note that we're moving to much wrapper heavier apis all over the platform as well, see for example the Typed OM in CSS, custom paint, layout, etc. I'd really hesitate to attempt this kind of optimization around Nodes when so much of the rest of the platform is moving to create more objects anyway. We need to just make wrapper creation faster if that's an issue you observe.

We certainly can think about what it means to construct nodes inside a worker, perhaps with a stream interface, or some kind of WorkerNode interface where you build a tree and then transfer it. Perhaps we need to start by untangling the various problems you're trying to solve here.

dherman commented 8 years ago

It seems like there's general agreement that some kind of transactionality is desirable, but with a key unresolved question being whether we should strive to avoid exposing intermediate states. I find the idea of exposing those states pretty concerning, and I think it's worth trying to achieve an API that avoids it, which AIUI is what this proposal is aiming at.

I also find the idea of transactional change records an appealing primitive for modeling the DOM -- it seems like a good "explain the platform" approach: an execution trace is a sequence of DOM changes. So where other APIs @esprehn mentions might use wrappers to talk about existing nodes, it makes sense for transactional change records to operate at a lower conceptual layer.

Also, I'd be pretty worried about arguments saying that absent empirical data, we should have our primitives do more things and bet on our optimizations getting smarter. I think our default orientation should be the reverse: our primitives should be simpler and try to do less.

drufball commented 8 years ago

I generally agree with the principle that there should be some reason to believe that the API will improve performance. In this case, there are multiple good reasons to believe that, in addition to the reduced cost of bindings:

I think Ian was saying that we need to see numbers supporting the idea that this specific incarnation of the API will lead to more performance benefits than a different incarnation. You're right that this proposal would have a lot of performance wins. But looking at the improvements you mentioned, it seems like they would also be received from any general WorkerNode + asyncAppend API. Can you point to any additional benefits this proposal might have?

  • a less error-prone API from the perspective of developers on the platform, via a guarantee that none of the operations in change list can interleave with JavaScript (making it impossible for code to rely on observable intermediate layouts or paints).

To make sure I am clear, the point here is that by proposing a separate API surface, we could exclude any operations that would be invalid coming from a Worker because they depend on a relayout on the UI thread? This reasoning is compelling to me, but is there a way we could introduce similar benefits without introducing an entirely new API surface?

  • a (significant) reduction in JavaScript allocations when constructing and applying the change list (assuming that NodeToken indeed can be designed to avoid allocations)

It sounds like this point is still a bit unclear. 1) this proposal may require more bindings? 2) it's unclear if reducing bindings would lead to significant performance improvements.

  • the ability for the engine to apply the change list in a "smeared" way with much more control over the scheduling of the work.

This benefit could be achieved through any asyncAppend style API.

  • the ability to locate more JavaScript code that handles computing the updates off the main thread with an efficient way to transfer the expected operations back to the UI thread.

This benefit could be achieved through any WorkerNode style API.

In short, I don't think there's any disagreement that we'll see performance wins by allowing DOM in a worker and asynchronous DOM appending. But we need to see compelling numbers that this version of the API leads to more wins than other versions before committing to it.

wycats commented 8 years ago

But looking at the improvements you mentioned, it seems like they would also be received from any general WorkerNode + asyncAppend API. Can you point to any additional benefits this proposal might have?

Before I do that, I'd be very interested to read a detailed WorkerNode proposal. Can you give me a pointer so I can do a proper analysis?

wycats commented 8 years ago

To make sure I am clear, the point here is that by proposing a separate API surface, we could exclude any operations that would be invalid coming from a Worker because they depend on a relayout on the UI thread?

More than that, the API gives developers a way to express a batch of DOM operations that by definition cannot accidentally interleave accesses that would force a layout flush. Because the batch is applied without interleaving JavaScript code, the API creates a natural staging between blocks of pure DOM mutation and blocks of CSS-dependent changes.

It sounds like this point is still a bit unclear. 1) this proposal may require more bindings? 2) it's unclear if reducing bindings would lead to significant performance improvements.

If it's impossible to implement the NodeToken type without introducing new heap allocated JavaScript object, I definitely want to know that. Is that what you're saying?

This benefit could be achieved through any asyncAppend style API.

As long as the only mutations we're talking about are various forms of append. Do you envision things like asyncNodeValue =, asyncSetAttribute, etc. Or is there a reason that only async versions of append operations would be necessary?

This benefit could be achieved through any WorkerNode style API.

I really want to see a detailed version of a WorkerNode-style proposal. I must have missed previous discussions fleshing it out; it's very hard for me to compare the impact of such a proposal against this proposal. I would really love a pointer to more details that could help me do a careful cost accounting of the different proposals.

annevk commented 8 years ago

The TypeScript might have been too confusing. To reiterate, it's just there to illustrate the conceptual model. In an implementation, all createElement() has to do is return a pointer and store that plus arguments in an underlying opaque buffer. That pointer can then be passed to setAttribute() that will similarly store an instruction in the opaque buffer. None of that is validated ahead of time. It's only "validated" once the buffer is run (i.e., applied).

So basically, we're just creating an ArrayBuffer consisting of a set of pointers and DOM mutation operations (agree that we should add shadow roots). That should make transfer between threads cheap, should be significantly better than object creation for larger number of objects, etc.

Also, the handwaving about WorkerNode while simultaneously saying this is adding too much API surface is not helping. Any kind of worker API would need at least this much API surface. And then folks likely want the same API in workers and documents so they can reuse library code at which point we're back to a new set of primitives, which is exactly what @wycats has delivered here.

(Furthermore, having new primitives helps address the mismatch between the HTML parser and DOM APIs in terms of name validation.)

domenic commented 8 years ago

The TypeScript might have been too confusing. To reiterate, it's just there to illustrate the conceptual model. In an implementation, all createElement() has to do is return a pointer and store that plus arguments in an underlying opaque buffer. That pointer can then be passed to setAttribute() that will similarly store an instruction in the opaque buffer. None of that is validated ahead of time. It's only "validated" once the buffer is run (i.e., applied).

Can you give more detail, perhaps IDL? Are you actually using unsigned long? If so, how do you deal with spoofing where I pass arbitrary "pointers" by choosing a likely-seeming integer, e.g. most-recently-returned integer minus 1 or similar?

Remember that the in-JS allocation cost of a wrapper is the same as the allocation cost of an integer (on a 64-bit system); both are 64-bit integers.

annevk commented 8 years ago
interface DOMTreeConstruction {
  unsigned long openElement(DOMString name, NamespacePrefix = "html");
  void closeElement();

  void setAttribute(DOMString name, optional DOMString value = "", optional NamespacePrefix = "");
  unsigned long appendText(DOMString text);
  unsigned long appendComment(DOMString text);
  Bounds appendHTML(DOMString html);
};
enum NamespacePrefix { "", "html", ... };

(Maybe unsigned long long?)

Spoofing does not matter. That would simply result in an error when the "C++ side" uses the instruction set to create a tree. And since we cannot support rollback and proper transactions anyway due to <iframe>, that doesn't seem like a problem. Or at least not one we can solve.

I'm not entirely sure what you're saying. The cost to create a promise or a node is equal to creating a number?

domenic commented 8 years ago

Not a promise, but for a Node, the JavaScript-side cost is equal, yes. (It is just a pointer to the C++ backing object.)

annevk commented 8 years ago

Sure, but you have to allocate the backing object too. Are you not counting that? This approach doesn't require a backing object since it's just an instruction set.

domenic commented 8 years ago

You have to allocate the backing Node in both approaches, as Yehuda pointed out in his OP.

annevk commented 8 years ago

No you don't and I don't think he did. All we're doing is building up an instruction set. Actual node creation happens during commit.

domenic commented 8 years ago

Yes, that's exactly what he said. You create the nodes anyway in both cases, either spread out over time while building the tree, or during the commit. ("This API creates DOM nodes in the engine, but it does not need to create JavaScript wrappers.")

annevk commented 8 years ago

Ah, great, I think we're on the same line then. The difference being of course that with this proposal the browser can schedule when creation happens and we can support workers (since DOM isn't thread-safe and is unlikely going to be anytime soon across implementations).

esprehn commented 8 years ago

That seems like an API you can build yourself. Write the instruction stream to an array in the worker, then read it back out in the main thread and execute the DOM operations. You can schedule the work with requestIdleCallback, setTimeout and requestAnimationFrame.

Also to go back to @dherman's comment, I'm not convinced we need DOM mutation transaction semantics like this. I do think we need a way to spread the style, layout and paint steps inside the browser out over several frames. I'm not convinced that doing that requires building an integer based abstraction over the DOM, that certainly doesn't match any existing implementation and doesn't feel like a primitive.

dherman commented 8 years ago

I'm not convinced we need DOM mutation transaction semantics like this. I do think we need a way to spread the style, layout and paint steps inside the browser out over several frames.

I don't see how a programming model can get off the ground without some kind of tool for expressing atomicity in the primitives.

wycats commented 8 years ago

That seems like an API you can build yourself. Write the instruction stream to an array in the worker, then read it back out in the main thread and execute the DOM operations. You can schedule the work with requestIdleCallback, setTimeout and requestAnimationFrame.

Which of those APIs allows code running on the main thread to know that it is starting to exceed its budget, and therefore should continue its work later, once the latency-critical work has finished?

bfgeek commented 8 years ago

Which of those APIs allows code running on the main thread to know that it is starting to exceed its budget, and therefore should continue its work later, once the latency-critical work has finished?

requestIdleCallback does this, it gives you a high-res timestamp for how much time you should take inside the callback.

wycats commented 8 years ago

@bfgeek Great!

I'll try to put together my best-guess describing my goals and how I would accomplish them in user-space with the available APIs so we can see precisely where the gaps are, and what APIs we would need to close the gaps.

drufball commented 8 years ago

Before I do that, I'd be very interested to read a detailed WorkerNode proposal. Can you give me a pointer so I can do a proper analysis?

Sorry! We've passed ideas in this area back and forth so many times I didn't realize that we've never actually written any of our thoughts down in one place. I threw a rough summary of our thoughts for a WorkerNode proposal into a repo so you can take a look:

https://github.com/drufball/worker-node

I think we all agree that there's value in allowing DOM mutation on workers and in allowing a 'smeared' application of layout. One concern we have is that each of these actions is valuable independent (and in sometimes unrelated ways) from each other. Because of this, we're hesitant to bundle both problems into a single, interleaved solution. Right now it feels like we're solving a specific use case (e.g. the Virtual DOM diff + patch model), which may prove unwieldy when people aren't building that precise functionality. For example, when developers just want to smear the layout for a DOM tree they're working with.

Can we separate these into two distinct proposals - one for async application of layout and another for DOM mutations on worker threads? It would be nice to solve each of these problems with their own primitive, making sure to design them such that they can be used in tandem where it makes sense. For example, we can make sure to keep this concept of a DOMChangeList in mind while developing the async append proposal - which is poorly named since it's about general async DOM mutations, not just append :)

Happy to continue the Worker DOM discussion in a repo that's focused on just that problem. We could use the repo I linked above, or proceed somewhere else!

drufball commented 8 years ago

I'll try to put together my best-guess describing my goals and how I would accomplish them in user-space with the available APIs so we can see precisely where the gaps are, and what APIs we would need to close the gaps.

That sounds awesome!

wycats commented 8 years ago

Right now it feels like we're solving a specific use case (e.g. the Virtual DOM diff + patch model), which may prove unwieldy when people aren't building that precise functionality.

Just to make sure we're all on the same page, Glimmer does not use a VirtualDOM diff + patch model. It doesn't maintain its own representation of an intermediate not-real DOM, and does not diff two not-real DOMs in order to determine what operations it needs to perform.

I know of at least three models (Google's Incremental DOM, React's Virtual DOM, and Ember's Glimmer VM approach) that provide write-only abstractions that can identify a list of changes to apply, and then apply them in a batch.

That doesn't necessarily mean that the requirements of those systems are general purpose (and I strongly agree with your concern about unnecessarily coupling solutions together), but the solution is not tightly coupled to the diff/patch approach used by React and similar approaches.

wycats commented 8 years ago

Can we separate these into two distinct proposals - one for async application of layout and another for DOM mutations on worker threads?

How would you feel about an effort to identify the lowest layer of DOM operations that is sufficient to describe the higher layering of just creating and mutating the DOM data structure that is used in DOM1-4? (Some of the work that Chrome has done in its extension's support for Isolated Worlds feels like it overlaps with the constraints here.)

That subset would then be a natural starting point for tree construction and mutation operations that could be transferred across a worker boundary (and could be decoupled from the more controversial effort to reduce the need to use garbage-collected JavaScript objects to represent the intermediate states).

annevk commented 8 years ago

Can we separate these into two distinct proposals - one for async application of layout and another for DOM mutations on worker threads?

I think I mentioned earlier there is value in having these combined, since it means that the API and programming model you end up with is shared, meaning abstractions can also be reused, etc.

I do think we need a way to spread the style, layout and paint steps inside the browser out over several frames.

But should it be up to the browser to decide which nodes get rendered when? I think developers would want control over what is included and what is not. The tight coupling of various disciplines is what worries me most about the "async append" proposal. It seems counter to the extensible web lessons.

I'm not convinced that doing that requires building an integer based abstraction over the DOM, that certainly doesn't match any existing implementation and doesn't feel like a primitive.

I would argue it's more of a primitive than tightly coupling DOM mutations and layout, but also:

Creating a patch is also a pattern that all major frameworks seem to have converged on. Some as the result of diffing, others through slightly different techniques. It definitely feels like a primitive worth having to me.

(@drufball, you might want to open separate issues for the other proposals. That should at least inform everyone following the development of the DOM Standard where it might go next.)

bmaurer commented 8 years ago

Hey guys -- @drufball pointed us at this proposal and the async dom proposal and asked for some thoughts from FB on the api. I think there's a lot of interest in both APIs. Facebook incrementally renders a large part of our site and being able to do things like do that rendering in a worker thread and send a transaction or to do async layout are really valuable.

With that in mind I wanted to take a step back and talk a bit about how this might benefit a site like FB. At a very high level the goal of the API is to keep the UI responsive in the face of manipulations to the UI. Ideally this API would make it possible to parallelize layout with work the application wants to do.   So to take a concrete example, let’s say that you’re using FB newsfeed and that the page is downloading a new feed story to display. At the same time the user is writing a comment.  

Jxck commented 7 years ago

This Proposal seems so nice for me. I think if DOMChangeList could serialize into string will let us ..

aronallen commented 7 years ago

Hi,

I have been working on something similar. I created a format called vDOMSON that is a subset of JSON. My dream is to standardize how vDOM nodes are represented, so we can completely omit h and just use the Array constructor. Hopefully vDOM libs would start adapting vDOMSON as input format, instead of creating their own JS-objects.

My first attempt is to convince the snabbdom project to adapt vDOMSON.

https://github.com/snabbdom/snabbdom/issues/186

What is cool about this approach is that the Web Workers do not need to import whatever vDOM lib that is used, since vDOMSON just is a subset of JSON. Furthermore the vDOM representation is fully decoupled from the patching algorithm.

I am transmitting these vDOMSON trees through the postMessage API, depending on the browser and message size stringify or structured copy is fastest.

What I would like to do to improve performance further. Instead of messaging the entire vDOMSON tree I would run the patch in the web worker and only transmit the necessary changes to the main thread where the actual DOM is touched.

I don't know if transferring objects from a sub-thread to the main thread could beat a really fast structured copy, the structured copy is a simpler approach and I haven't hit any noticeable performance bottlenecks.

samccone commented 6 years ago

@domenic

One concern I have about the idea of running these things in a worker is that the IPC involved in getting back to the main thread (in the case of hundreds or thousands of nodes/nodetokens) would probably be a significant cost. I'm not sure whether it would be greater than or less than the cost of just doing the work on the main thread; similarly I am not sure whether the IPC would be significant enough to cause jank. But maybe all you have to IPC is a single memory location of the queue in which case it is presumably fine.

With the arrival of shared array buffers and atomics, this IPC overhead should be little to none now since the instruction backing buffer can be transfered via a SAB.

domenic commented 6 years ago

If they actually use numbers though, we run into the forgeability problem pointed out in the first part of https://github.com/whatwg/dom/issues/270#issuecomment-227424135

wycats commented 6 years ago

@domenic How much can existing implementations create new value types internally? I think it's important that these values aren't real pointers, but any kind of unforgeable (but copyable) value would be fine.

wycats commented 6 years ago

For those following along, I've started an implementation of the tree construction part of this proposal for use in Glimmer: https://github.com/glimmerjs/glimmer-vm/tree/master/packages/%40glimmer/dom-change-list.

If there's interest in helping to work on this outside of Glimmer, I'd be happy to extract it into its own repository. I'll also be publishing regular npm snapshots pretty soon.

Gozala commented 6 years ago

@wycats I have being implementing library for encoding and applying changeLists https://github.com/gozala/dominion which is largely inspired by this proposal. It might be worth meeting discussing our efforts.