w3c / long-animation-frames

Long Animation Frame API specification repository
Other
10 stars 1 forks source link
performance-metrics

Long Animation Frames API

For the editor's draft, see https://w3c.github.io/long-animation-frames/

See also the developer facing article.

Overview

"Sluggishness" - the feeling that "this page is not responsive to interactions", is a common problem for users on the web today. By introducing INP into Core Web Vitals, we hope that authors can have a better indication as to how their pages are doing in that regard. But INP shows you the effect, but not the cause of the sluggishness.

Long animation frames (LoAF), as a revamp of longtasks, aim to assist with that: a LoAF is an indication that at a particular point in time, the browser was congested, in such a way that it took a long time from the beginning of a task until updating the rendering (or until the point where it was clear that no render was necessary).

Since having busy ("LoAF-heavy") sequences can potentially cause delayed response to interactions, and the LoAF entries themselves contain information about what was blocking, e.g. long scripts or layout, LoAF can potentially become a powerful tool, enabling the use of real-user monitoring (RUM) to diagnose for this type of performance issue.

Value to End Users & Publishers

The value to end users here is enabling web pages to measure and fix sluggishness - thus making the web more responsive for everyone.

See this success story from a participant of the origin trial - they ran LoAF (using RUMVision) in user sessions, and identified that using a particular 3P JavaScript was causing sluggishness (bad INP). They refactored to not use that 3rd party, and their responsiveness improved dramatically.

Business Value

Several case studies like The Economic times and RedBus, demonstrate that improving page responsiveness via the INP metric directly helps with user delight and has a positive business impact.

Actionability

Main-thread congestion is one of the main causes for long INP, but INP itself doesn't help diagnose the root cause. By giving a powerful diagnostic API like LoAF, we give developers a tool with which they can act and make their pages more responsive by monitoring what makes them unresponsive in the field.

History

Long tasks have long been a way to diagnose and track lack of responsiveness or "sluggishness", which eventually affects Core Web Vital metrics like INP, or metrics like Total Blocking Time. Developers have been using them with varying degrees of success, and now we can learn from the experience and see what can be improved going forward.

Where long tasks fall short

Long tasks rely on the underlying notion of a task. This is a somewhat well-specified term, but we found that it has a few shortcomings:

  1. A task does not include the update the rendering phase. That phase includes requestAnimationFrame callbacks, resize observers, scroll observers and so on. This means that a lot of the busy time that blocks feedback or animation is not actually counted as part of the "long task", and developers can game their long task timing by moving long operations into a requestAnimationFrame callback.

  2. Some operations that should be tasks are not specified or implemented to use tasks. For example, it is not specified how UI events are integrated into the event loop and the use of tasks there is implementation-specific.

  3. A task in implementations is used for internal scheduling, and not just as an implementation of the spec concept of task. This means that changes to implementation detail related to scheduling affects the measurement of long tasks, sometimes in unexpected, incompatible or arbitrary ways. A big implementation change in Chrome silently changed the meaning of long tasks, when we started updating the rendering as part of a new task.

  4. A task may contain multiple callbacks, e.g. dispatch several events. This makes it sometimes confusing to decipher what was the root cause of a long task.

All of the above are part of the same issue - a task is an incomplete and inaccurate cadence to measure main-thread blocking. It's either too granular (as several tasks together may be the cause of blocking) or too coarse (as it may batch together several event handlers, and callbacks such as requestAnimationFrame are not tasks in themselves).

The Current Situation

The HTML event loop processing model can be roughly described as such:

while (true) {
    const taskStartTime = performance.now();
    // It's unspecified where UI events fit in. Should each have their own task?
    const task = eventQueue.pop();
    if (task)
        task.run();
    if (performance.now() - taskStartTime > 50)
        reportLongTask();

    if (!hasRenderingOpportunity())
        continue;

    invokeAnimationFrameCallbacks();
    while (needsStyleAndLayout()) {
        styleAndLayout();
        invokeResizeObservers();
    }
    markPaintTiming();
    render();
}

However, the Chromium implementation is more like this:

while (true) {
    const startTime = performance.now();
    const task = eventQueue.pop();
    if (task)
        task.run();
    uiEventQueue.processEvents({rafAligned: false});
    if (performance.now() - startTime > 50)
        reportLongTask();

    if (!hasRenderingOpportunity())
        continue;

    eventQueue.push(() => {
        // A new task! so this would report a separate longtask.
        uiEventQueue.processEvents({rafAligned: true});
        invokeAnimationFrameCallbacks();
        while (needsStyleAndLayout()) {
            styleAndLayout();
            invokeResizeObservers();
        }
        markPaintTiming();
        render();
    });
}

This means that in Chromium, several implementation details affect how long tasks are measured:

  1. Rendering gets its own task, which may be long.
  2. Event handlers sometimes execute in their own task, sometimes as part of the work task, sometimes as part of the rendering task.

This demonstrates how relying on tasks is brittle.

Introducing LoAF

LoAF (long animation frame) is a new proposed performance entry type, meant to be a progression of the long task concept.

It's the time measured between when the main thread started doing any work (see startTime here), until it is either ready to paint or idle (has nothing to do). It may include more than one task, though usually up to two. Because it ends at the paint-mark time, it includes all the rendering observer callbacks (requestAnimationFrame, ResizeObserver etc.) and may or may not include presentation time ("pixels on screen" time), as that is an implementation-specific term.

In addition to making the cadence fit better with what it measures, the entry could include extra information to help understand what made it long, and what kind of consequences it had:

Processing model

The new proposal:


let frameTiming = null;

while (true) {
    if (frameTiming === null) {
        frameTiming = new AnimationFrameTiming();
        frameTiming.startTime = performance.now();
    }

    const task = eventQueue.pop();
    if (task)
        task.run();

    if (!hasDocumentThatNeedsRender()) {
        frameTiming.renderEnd = performance.now();
        if (frameTiming.renderEnd - frameTiming.startTime > 50)
            reportLongAnimationFrame();
        frameTiming = null;
        continue;
    }

    if (!hasRenderingOpportunity())
        continue;

    invokeAnimationFrameCallbacks();
    frameTiming.styleAndLayoutStart = performance.now();
    for (const document of documentsInThisEventLoop) {
        while (document.needsStyleOrLayout()) {
            document.calculateStyleAndLayout();
            invokeResizeObserverCallbacks();
        }
    }
    frameTiming.renderEnd = performance.now();
    markPaintTiming();
    if (frameTiming.renderEnd - frameTiming.StartTime > 50)
        reportLongAnimationFrame();

    frameTiming = null;
    render();
}

How a LoAF entry looks like

const someLongAnimationFrameEntry = {
    entryType: "long-animation-frame",

    // The start time of the first task that initiated the long animation frame.
    startTime,

    // https://html.spec.whatwg.org/#event-loop-processing-model (17)
    // This is a well-specified and interoperable time, but doesn't include presentation time.
    // It's the time after all the animations and observers are done, style and layout are done,
    // and all that's left is painting & compositing. In the case of a task that didn't end up
    // updating the rendering, this would be the long task duration.
    duration,

    // https://html.spec.whatwg.org/multipage/webappapis.html#update-the-rendering
    // The time where the rendering cycle has started. The rendering cycle includes
    // requestAnimationFrame callbacks, style and layout calculation, resize observer and
    // intersection observer callbacks. In Chromium it may also include some event listeners,
    // particularly for animation-aligned events such as mouse/touch events.
    // Equivalent to BeginMainFrame in Chromium
    renderStart,

    // https://html.spec.whatwg.org/multipage/webappapis.html#update-the-rendering (#14)
    // Beginning of the time period spend in style and layout calculations. This includes
    // ResizeObserver callbacks
    styleAndLayoutStart,

    // Time of the first UI event (mouse/keyboard etc.) to be handled during the course of this
    // frame. The timestamp is the event's
    // [timestamp](https://dom.spec.whatwg.org/#dom-event-timestamp), i.e. the time it was queued
    // which could be long before it was processed.
    firstUIEventTimestamp,

    // The duration in milliseconds that the animation frame was being blocked in practice.
    // Given that LoAFs can contain multiple tasks, we consider the following as blocking durations:
    // * Long tasks
    // * The longest task + the rendering time, if their sum exceeds the Long Task threshold of 50ms.
    // The blockingDuration would be the sum of those long task durations, with 50ms subtracted from each.
    blockingDuration,

    // A list of long scripts that were executed over the course of the long frame. Scripts reported
    // here must be at least 5ms in duration, and were executed in windows of the same origin as the
    // current window (e.g. the same window, iframes, popups of the same origin).
    // Note that these scripts are entry points to JS: the place where the platform calls a script.
    scripts: [
        {
            // These are always "script"
            name,
            entryType,

            // The different script invoker types help us understand the scenario from which the long script
            // was invoked
            invokerType:
                // A known callback registered from a web platform API, e.g. setTimeout,
                // requestAnimationFrame.
                "user-callback" |

                // A listener to a platform event, e.g. click, load, keyup, etc.
                "event-listener" |

                // Handler of a platform promise, e.g. fetch(). Note that in the case of promises,
                // all the handlers of the same promises are mixed together as one "script".
                "resolve-promise" | "reject-promise" |

                // Script evaluation (e.g. <script> or import())
                "classic-script" |
                "module-script"

            // The invoker tries to give as much information about the *invoker* of the script.
            // For callbacks: Object.functionName of the invoker, e.g. Window.setTimeout
            // For element event listeners: TAGNAME#id.onevent, or TAGNAME[src=src].onevent
            // For script blocks: the script source URL
            // For promises: The invoker of the promise, e.g. Window.fetch.then
            // Note that for promise resolvers, all of the handlers of the promise are mixed
            // together as one long script.
            invoker: "IMG#id.onload" | "Window.requestAnimationFrame" |
                  "Response.json.then",

            // when the function was invoked. Note that this is the startTime of the script, not
            // the startTime of the frame (each entry in the performance timeline has a startTime)
            startTime,

            // If this script was parsed/compiled, this would be the time after compilation.
            // Otherwise it would be equal to startTime
            executionStart,

            // the duration between startTime and when the subsequent microtask queue has finished
            // processing
            duration,

            // Total time spent in forced layout/style inside this function
            forcedStyleAndLayoutDuration,

            // Total time spent in "pausing" synchronous operations (alert, synchronous XHR)
            pauseDuration,

            // In the case of promise resolver this would be the invoker's source location
            // Note that we expose character position rather than line/column to avoid overhead of line splitting.
            sourceURL: "https://example.com/big.js",
            sourceFunctionName: "do_something_long",
            sourceCharPosition: 10,

            // Relationship between the (same-origin) window where this script was executed and
            // this window.
            windowAttribution: "self" | "descendant" | "ancestor" | "same-page" | "other"

            // A reference to the same-origin window that originated the script, if it's still
            // alive.
            window,
        }
    ]
}

Security & Privacy Considerations

At the most part, LoAF only exposes information across same-origin windows. Information about scripts within a window is already observable, e.g. using resource timing or a service worker.

However, LoAF might expose rendering information for a particular document tree that may be cross-origin (same-agent). The details about rendering the frame, such as styleAndLayoutStartTime, are proposed to be visible to all the same-agent windows that are rendered serially. That's because this information is already observable, by using requestAnimationFrame and ResizeObserver and measuring the delay between them. The premise is that global "update the rendering" timing information is already observable across same-agent windows, so exposing it directly does not leak new cross-origin information.

On top of that, LoAF only exposes this timing when the animation frame is long, while using the existing techniques can measure this timing also for short animation frames.

To conclude, this new API exposes cross-origin same-agent information that is currently already available and not protected, and in a lower fidelity than existing APIs.

Notes, complexity, doubts, future ideas, TODOs

  1. One complexity inherited from long tasks is the fact that the event loop is shared across windows of the same agent (or process). The solution here is a bit different but relies on similar principles:

    1. Only frames in visible pages report long frames.

    2. An observer fires only if its rendering was blocked by the long frame in practice, or if the long task (that didn't cause a render) belonged to that page.

    3. Breakdown to scripts is only available to the frame where they were invoked. Other frames receive an "opaque" breakdown: attribution of a blocking task to a different window - similar to the existing attribution.

  2. To avoid the magic 50ms number, consider making the threshold configurable, or rely on "discarded rendering opportunities" as the qualifier for sluggishness alongside (or instead of) millisecond duration.

  3. Exposing source locations might be a bit tricky or implementation defined. This can be an optional field but in any case requires some research.

Relationship with TBT

TBT (total blocking time) is a metric that allows measuring how responsive/sluggish the experience is during page load. It's mostly considered a lab metric, e.g. for lighthouse, but it's also measurable in the field.

Once the definition & implementation of long animation frames is stable, the current (potential) plan is to compute TBT based on the LoAF entries' blockingDuration. This would have the following benefits:

Overlap with Event Timing

With all the new data that LoAFs expose, their overlap with event timing grows. This is true, but it's only a problem if we look at them as separate APIs.

The dozen-or-so different entry types that go into a performance timeline are not separate APIs per-se, but rather queries into the same dataset - the chain of events that helps us understand sluggishness, jank, and instability.

As such, event-timing and LoAF query that dataset differently: