Long Animation Frame API (LoAF)

こんにちは TAG-さん!

I'm requesting a TAG review of Long Animation Frame API.

LoAF an API that lets a webpage measure long periods of time where the main thread was busy/congested, resulting in sluggishness. It also adds information that helps understand what caused that busy period and act on it.

Explainer: https://github.com/w3c/longtasks/blob/main/loaf-explainer.md
Specification: https://w3c.github.io/longtasks/
User research: [url to public summary/results of research]
Security and Privacy self-review²: https://docs.google.com/document/d/1rsaghOOGlTKyzFHR9d_cIRLpZoik3O_S2yYh_oq_jAM/edit?usp=sharing
GitHub repo (if you prefer feedback filed there): https://github.com/w3c/longtaskד
Primary contacts (and their relationship to the specification):
- Noam Rosenthal (@noamr), Google
- Yoav Weiss (@yoavweiss), Google, Group chair
- Michal Mocny (@mmocny), Google
Organization/project driving the design: Google
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/6118675067699200

Further details:

[x] I have reviewed the TAG's Web Platform Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): WebPerfWG
The group where standardization of this work is intended to be done ("unknown" if not known): WebPerfWG
Existing major pieces of multi-stakeholder review or discussion of this design: TBD
Major unresolved issues with or opposition to this design: Not yet...
This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify @noamr

Hi @noamr, thank you for proposing this idea and running it through TAG early on. Helping authors remove as much "sluggishness" as possible is always great to see.

Given the deep level of browser integration I am curious to hear your thoughts on having this interoperable across UAs. If I read the proposal correctly, you're proposing lots of lower-level details to be exposed - stages of the rendering pipeline, threading model of the browser, and additional data that might help in correlating all of this into a user action, i.e. they triggered style recalc due to getBoundingClientRect call etc.
The above differs a lot between implementations and I'm concerned that micro-optimizations for one browser will tip other browsers into the opposite direction. Have you discussed this with other browser vendors?

In order to make such feature robust I am assuming you envision the ability to construct call graphs. If so, how do you envision event trigger correlation - e.g. getBoundingClientRect >> style load >> style parse >> style recalc etc.

Reading the S&P statements in the document and those in the template there seem to be some inconsistencies. For example, in the explainer you say

LoAF might expose rendering information for a particular document tree that may be cross-origin (same-agent).

while the S&P doc says

A cross-origin iframe only receives timing information for itself.

To me this reads as a bit of contradiction, i.e. a top-level document (an iframe) can receive a cross-origin information from another document in its tree.

Further, this information is to be exposed to all windows of the UA making it more of a global exposure. I'm probably misreading your points and if not, can you help me alleviate this concern?

Hi @noamr, thank you for proposing this idea and running it through TAG early on. Helping authors remove as much "sluggishness" as possible is always great to see.

Thanks for taking a look!

Given the deep level of browser integration I am curious to hear your thoughts on having this interoperable across UAs. If I read the proposal correctly, you're proposing lots of lower-level details to be exposed - stages of the rendering pipeline, threading model of the browser, and additional data that might help in correlating all of this into a user action, i.e. they triggered style recalc due to getBoundingClientRect call etc. The above differs a lot between implementations and I'm concerned that micro-optimizations for one browser will tip other browsers into the opposite direction. Have you discussed this with other browser vendors?

We have discussed some of this with other browser vendors, and the conversation goes on. The major details in the spec expose things that are well specified and already observable like the stages in the rendering pipeline. We're making sure that whatever is in the spec matches those existing concepts. Other things that might not be interoperable can be optional and in any case I'll be happy to discuss them with the other vendors.

In order to make such feature robust I am assuming you envision the ability to construct call graphs. If so, how do you envision event trigger correlation - e.g. getBoundingClientRect >> style load >> style parse >> style recalc etc.

We don't expose things to that level - only the total forced style and layout time of a whole script runtime. Still, we'll have to work out interoperability and I see this particular attribute as optional in the spec.

Reading the S&P statements in the document and those in the template there seem to be some inconsistencies. For example, in the explainer you say

LoAF might expose rendering information for a particular document tree that may be cross-origin (same-agent).

while the S&P doc says

A cross-origin iframe only receives timing information for itself.

To me this reads as a bit of contradiction, i.e. a top-level document (an iframe) can receive a cross-origin information from another document in its tree.

Further, this information is to be exposed to all windows of the UA making it more of a global exposure. I'm probably misreading your points and if not, can you help me alleviate this concern?

What those lines mean, and I will clarify in the spec, is that the timing itself exposes cross-origin same-agent by design even without this API. You can add timeouts and rAFs to a page, and see if you get delays which are likely due to other same-agent pages doing janky stuff. The API doesn't expose anything other than those delays, and you don't get visibility as to whether they're from other same-agent documents or from general browser slowness or what not.

Where we send the info to is to frames that participate in the LoAF in one of the following ways:

generated a long task that didn't end up rendering, will get its own non-rendering LoAF
its rendering was updated in this LoAF, meaning that you could sample the delays yourself with rAF/ResizeObserver etc.

We never send script or blocking-duration information to cross-origin frames, only delays that they could otherwise sample themselves. Does this answer your question?

Hi @noamr, thanks for the additional context.

We have discussed some of this with other browser vendors, and the conversation goes on.

I can't tell how supported the feature by other browser vendors is by this statement. Can you please elaborate or point me to observable discussions?

The major details in the spec expose things that are well specified and already observable like the stages in the rendering pipeline. We're making sure that whatever is in the spec matches those existing concepts.

I am concerned that LoAF is attempting to expose something observable today, and make it easier to obtain, cross-origin and cross-frame without justifying if it is good for users or not. In particular, this type of exposure appears as ancillary user data - is it?

Couple of additional points:

Interop (similar to my first question above), I still can't tell how well implementable the feature across engines is (great to see you're working with Moz and Webkit).
The overall extensibility model - for me, the major benefit of the feature compared to what is possible today (despite not being too easy) is the ability to create causality - "what is causing a chain of events that is resulting in a long running animation?", and I can't tell what that would look like from the current work. Perhaps you can point me to additional examples, or work in progress I can learn from?

Hi @noamr, thanks for the additional context.

We have discussed some of this with other browser vendors, and the conversation goes on.

I can't tell how supported the feature by other browser vendors is by this statement. Can you please elaborate or point me to observable discussions?

Of course! Minutes of the discussion at TPAC: https://w3c.github.io/web-performance/meetings/2023/2023-09-TPAC/index.html

Search LoAF. According to WebKit folks this is more implementable than long tasks, and according to Firefox this is more in line with benchmarks like Speedometer 3. We filed standards position requests with both (https://github.com/mozilla/standards-positions/issues/929, https://github.com/WebKit/standards-positions/issues/283).

The major details in the spec expose things that are well specified and already observable like the stages in the rendering pipeline. We're making sure that whatever is in the spec matches those existing concepts.

I am concerned that LoAF is attempting to expose something observable today, and make it easier to obtain, cross-origin and cross-frame without justifying if it is good for users or not. In particular, this type of exposure appears as ancillary user data - is it?

The data observed in LoAF is how long rendering in your own origin was delayed. This is information you should probably know, and have access to anyway. The way in which this API exposes this information is marginally easier than measuring it yourself - the main thing that this API makes easier is understanding the root cause of delays caused by your own origin. An equivalent would be resource timing, where if you have other origins clogging your network bandwidth it would be observable as slow downs for your own resources.

I think it follows the principles in https://w3ctag.github.io/privacy-principles/#information: "New APIs which add new ways of getting information must be guarded at least as strongly as the existing ways". This holds here. Information about how much rendering was delayed is not currently guarded in any way (and cannot be guarded, except by means of process isolation).

Couple of additional points:

Interop (similar to my first question above), I still can't tell how well implementable the feature across engines is (great to see you're working with Moz and Webkit).

Waiting for them to respond to the standards position open issues.

The overall extensibility model - for me, the major benefit of the feature compared to what is possible today (despite not being too easy) is the ability to create causality - "what is causing a chain of events that is resulting in a long running animation?", and I can't tell what that would look like from the current work. Perhaps you can point me to additional examples, or work in progress I can learn from?

Not only "long running animation", but also "slow responsiveness". See the earlier attached TPAC minutes for use case from Microsoft, they've been successfully using it to reduce scroll jankiness, and also success stories from RUMVision where they actively use this (as part of the origin trial) to pinpoint scripts responsible for sluggish websites for their customers.

To demonstrate how this doesn't expose new ancillary data, consider the following. You want to know if other frames in the process are blocking. To do that today, you can run:

const before = performance.now();
setTimeout(() => {
   const delay = performance.now() - before;
}, 0);

Same with a requestAnimationFrame if you want to measure rendering delay in particular.

To do that with LoAF, you'd have to register an observer, and make sure that the delay surmounts to more than 50ms. This makes LoAF a very blunt instrument to measure things that can be measured in a sharp way today...

@plinss and I looked at this today and it seems broadly acceptable. We have a few concerns here, but none of these really change our overall positive disposition.

We observe that the spec claims that thresholding durations is an effective mitigation strategy for timing attacks. This is not correct. Thresholding only limits the rate at which information can be extracted. The specification rightly points out that these measurements are already possible, but claims this does not make things worse. This is also incorrect. Being able to measure multiple timing sources at the same time makes the rate of information extraction much higher. This is still probably a worthwhile trade-off overall, but please do not pretend like the risk has been eliminated.

We also noted the monekypatch of WebIDL, hopefully you're talking to the WebIDL folks to get those changes folded in and will be removing the monkeypatch. See our guidance in this area.

@plinss and I looked at this today and it seems broadly acceptable. We have a few concerns here, but none of these really change our overall positive disposition.

We observe that the spec claims that thresholding durations is an effective mitigation strategy for timing attacks. This is not correct. Thresholding only limits the rate at which information can be extracted. The specification rightly points out that these measurements are already possible, but claims this does not make things worse. This is also incorrect. Being able to measure multiple timing sources at the same time makes the rate of information extraction much higher. This is still probably a worthwhile trade-off overall, but please do not pretend like the risk has been eliminated.

We also noted the monekypatch of WebIDL, hopefully you're talking to the WebIDL folks to get those changes folded in and will be removing the monkeypatch. See our guidance in this area.

Thanks for the review! Indeed the remaining WebIDL monkey patches are in process of being upstreamed (see https://github.com/whatwg/webidl/pull/1400). I will take your comments into account and make the S&P section of the spec more accurate to those points.

w3ctag / design-reviews

Long Animation Frame API (LoAF) #911