w3c / event-timing

A proposal for an Event Timing specification.
https://w3c.github.io/event-timing/
Other
44 stars 13 forks source link

Default buffering of Event Timing Entries is based on durationThreshold: 104 -- even for Interactions. #124

Open mmocny opened 1 year ago

mmocny commented 1 year ago

Buffering of EventTiming entries, which buffers from page load until a PerformanceObserver is registered, is always based on a durationThreshold: 104.

When you register a new PerformanceObserver that asks for events with durationThreshold lower than that (i.e. durationThreshold: 40 to options), there cannot already be any such Entries stored in the buffer. It may be surprising to receive buffered entries >= 104, but none in the [40, 104) range.

A few options:

yoavweiss commented 1 year ago

Dynamic buffering is interesting! The main concern with buffering "too much" is around memory constraints, and can be a good solution to that. It'd require a bit more work on the UA side of things though, and would require that to happen even on pages where there will never be an EventTiming listener.

I'd be curious to hear from RUM collectors if this is a real issue they are bumping up against when collecting EventTiming entries.

mmocny commented 1 year ago

True that buffing, and therefore dynamic buffering, is a cost that is paid on every page even if it ends up not used (by design). Filtering a fixed sized list on simple criteria, only when its full feels like a small constant amount of work, but its not free.

There were some proposals in the past to support a policy via header/meta tags I think to control overriding this behaviour... Anyway, I don't think dynamic buffering is ideal solution here anyway, because Event Timing reports on so many different event types, and the event counts are not evenly distributed (see below).

I'd be curious to hear from RUM collectors if this is a real issue they are bumping up against when collecting EventTiming entries.

@philipwalton the author of web-vitals.js library reported this as a real issue. Users of that library noticed it when trying to use it to measure INP. (The web-vitals library uses 40ms as the default durationThreshold and this issue means there results of measurement are dependant on the timing of when the library loads-- which PO buffering is meant to prevent.)

But yes, I would love more input here to see if this is a real issue, given that we still do have eventCounts for things under threshold.


Try this to show total eventCounts for any page:

console.table([...performance.eventCounts.entries()])

Inject early in page load to measure events >=40ms only:

let c = 0;
new PerformanceObserver(list => {
  for (let entry of list.getEntries()) {
    console.log(c++, entry.duration)
  }
}).observe({ type: 'event', durationThreshold: 40 });

Anecdote: On my desktop using a mouse pointer, after a mere few seconds on a popular news site I saw reported already 492 events which had duration over 40ms (due to loading main thread contention). All those events were continuous i.e. pointerenter/leave events or similar.

Dynamic buffering would help, but maybe having different default threshold for events with interactionId is better?

philipwalton commented 1 year ago

@philipwalton the author of web-vitals.js library reported this as a real issue.

My biggest gripe with the current behavior is that it's unexpected.

From an API design perspective, it's not great to have a configuration object where two of the options don't work together and the only way to discover that is to read the spec :)

I understand that the buffering concerns are important, and I'm not suggesting that browsers should buffer all entries as a solution to this. However, I do think there's a real issue here. And what makes matters worse is the issue is not immediately obvious, so a developer could spend a lot of time building and deploying some monitoring code only to realize much later that they have gaps when looking at the data they collect.

Some potential options to improve this:

On that last point, do we know of any sites / RUM providers who are using Event Timing and looking at mouse/pointer events? I wonder if issues related to lots of continuous events would be better handled by a separate API (e.g. something like frame timing).

To comment on some of Michal's suggestions

A few options:

  • Change spec to buffer certain types of Events with a lower default threshold (i.e. all discrete events, or only those with interactionId)
  • Change spec to support "dynamic buffering". Start by buffering entries with any duration until buffer is full. Once buffer fills, filter out the fastest entries and bump the threshold. Keep doing so until threshold reaches 104ms, after which buffer just stays full.

I think both of these would do a lot to decrease the chances that anyone notices (or is negatively impacted by) this issue, but I don't think they address the root problem.

I wonder if we could use one of these suggestions along with something that makes the API less prone to misuse.

nhelfman commented 1 year ago

Just a small input on this. My feeling is that the "dynamic buffering" proposal can be confusing for developers, even if spec changes have very concrete and detailed definition of the behavior.

yoavweiss commented 1 year ago

Alternatively, we currently buffer up to 150 entries before an observer is registered. If we lower the duration for these entries, we're likely to reach that limit sooner, but maybe that's enough for perf observers to be registered?

@mmocny - would you be able to add use counters for cases where perf observers with a buffered flag were registered after the event timing buffer was full? And then maybe experiment with lowering the threshold and seeing if that count increases for those users?

mmocny commented 1 year ago

Sure, will file an issue.


I will say: my assumption is that this will depend entirely on the hardware used, and the number of continuous events fired.

You can try this on any page you use:

function getTotalEventCounts() {
    return Array.from(performance.eventCounts.values()).reduce((a,b) => a+b);
}
setInterval(() => console.log(getTotalEventCounts()), 1000)