[LoAF] Allow for annotations for long animation frame to further help debugging

jarmit commented 1 year ago

It can be difficult to track down the source of a particular function if that function is a common function that is used multiple times within the code base. I would like the ability to annotate a function and long animation frame would use that name instead of something generic like ResizeObserverCallback. In the example below, I would like the name to be either SourceOfCode or ResizeObserverCallback(SourceOfCode) or something like this.

const func = ([entry]: ResizeObserverEntry[]) => {
    // do work
};
func.loafName = 'SourceOfCode'
const resizeObserver = new ResizeObserver(func);
resizeObserver.observe(target.current);

I do not care about the mechanism or the syntax to achieve this.

noamr commented 1 year ago

Thanks for posting the issue! Note that scripts have a sourceLocation property, which includes the function name. Their syntax is function@scriptSourceURL:characterIndex.

So the way to do what you specifically asked for is:

new ResizeObserverEntry(function SourceOfCode([entry]: ResizeObserverEntry[]) {
  // ...
}).obeserve(target.current);

And this would appear in the entry's sourceLocation as SourceOfCode@myScriptURL.com:123 or some such.

jarmit commented 1 year ago

This makes sense. I can make this work. For my specific case I think I would want something like

const cb = ([entry]: ResizeObserverEntry[]) => {
  // ...
}
Object.defineProperty(cb, 'name', { value: 'SourceOfCode' });
new ResizeObserver(cb).obeserve(target.current);

noamr commented 1 year ago

Reopening, there are use cases where this can really be useful. CC @mmocny

mmocny commented 1 year ago

Reading the original request, I see reference to:

function is a common function that is used multiple times within the code base

But later I only see reference to just making sure a nice name value is exposed. Is it actually important to somehow differentiate the "context" from which a particular callback is invoked? I'm not sure how often that would come up...

Second, if you really want to control labelling, and do this in a way where the developer is explicitly provide a hint-- I wonder if just regular User Timings aren't both sufficient and more flexible?

Just performance.measure() the time spent in specific sections of code (function boundary or whatever you want) and then use LoAF script attribution timestamps to intersect time ranges of your user timings.

That way, you can measure all the time ranges for that code, but also filter to cases where it contributed to an overall long-running script within a LoAF.

That said, I think it may be valuable if LoAF could do more attribution splitting automatically, on finer boundaries beyond just "tasks". For example, in cases where multiple continuations are registered on a shared promise. Or where a script calls back into previously registered handlers, but in ways where there isn't a 'hop' though some web platform api (like setTimeout) etc.

One (far fetched) idea that comes to mind is perhaps any time an AsyncContext is explicitly switched, if that api ever does take off?

Of course, a last option is to just rely on yield points, and use that as additional motivation for developers to just yield!

noamr commented 1 year ago

Reading the original request, I see reference to:

function is a common function that is used multiple times within the code base

But later I only see reference to just making sure a nice name value is exposed. Is it actually important to somehow differentiate the "context" from which a particular callback is invoked? I'm not sure how often that would come up...

Second, if you really want to control labelling, and do this in a way where the developer is explicitly provide a hint-- I wonder if just regular User Timings aren't both sufficient and more flexible?

It is, but it can also add a lot of noise. The difference here is that this is "measure but discard if it's not a LoAF".

That said, I think it may be valuable if LoAF could do more attribution splitting automatically, on finer boundaries beyond just "tasks". For example, in cases where multiple continuations are registered on a shared promise. Or where a script calls back into previously registered handlers, but in ways where there isn't a 'hop' though some web platform api (like setTimeout) etc.

One (far fetched) idea that comes to mind is perhaps any time an AsyncContext is explicitly switched, if that api ever does take off?

Of course, a last option is to just rely on yield points, and use that as additional motivation for developers to just yield!

Those are interesting ideas, I still don't see anything actionable there but let's continue the conversation.

gilbertococchi commented 1 year ago

+1 to consider improving this use case.

There are several RUM providers wrapping Callback/functions that would be flagged by LoAF as Long LoAFs triggers hiding the real Script sourceLocation information...

patrickhousley commented 7 months ago

New Relic RUM needs some way of informing developers that hook into the LoAF API directly that our wrapper is not the cause of their long animation frames. I was going to look at maybe just overriding the name on the function given the wrapped function has a name property but this is probably going to cause more issues, especially around stack trace cleansing, even if it does work.

Ultimately, we need a way to direct the developer back to the offending code and I don't think having just a name is sufficient. Even if we could update the name, if the line numbers and file name still point to the RUM provider, developers will still be confused.

noamr commented 3 months ago

Coming back to this, I think a potential way to address this is an additional attribute in {mark|measure}Options that helps filter user timing entries, with one of the options being to include those entries in the buffer only if they contribute to a LoAF (or a LoAF long-script).

So a library like NewRelic as mentioned in https://github.com/w3c/long-animation-frames/issues/3#issuecomment-2025930993 could wrap the user's function with performance.measure("user-function", {filter: "long-animation-frame"}) (strawman) and then it would be a much better insight than just the entry point.

mmocny commented 3 months ago

with one of the options being to include those entries in the buffer only if they contribute to a LoAF (or a LoAF long-script).

It seems to me like you would have to buffer these at first, then clean up these marks and measures afterwards.

Today, there is no limit to the number of marks/measures in the buffer, so overflow is not really a problem (unless so many are issued that we get into overall memory usage worries).

I think that means that: just adding marks/measures, leaving them there, then filtering manually, might be sufficient?

What about just a getEntriesBy* variant that accepts both a type and also a time range (and optionally that this time range can be specified by passing another entry)?

The usage would be something like:

Create a PerformanceObserver for LoAF
From each LoAF entry observed, ask for the marks/measures that overlap its timing

mmocny commented 3 months ago

As per your original proposal, Related questions:

It seems not so common to me to have active PerformanceObservers for marks/measures, but if someone did have one, would we want to report these LoAF-only entries?
If we did, would we delay adding these entries into the timeline until after we know if a LoAF happened (i.e. a separate buffer)?
If we did that, would we wait only until we know a LoAF is being measured, or wait for the LoAF is done measuring (i.e. LoAF always reported first)?

mmocny commented 3 months ago

Also also, there have been requests to add groups/categories/labels to User Timings multiple times for multiple purposes. I wonder if this really needs to be LoAF specific or if it could just be a generic feature like that? Then library authors could use as they see fit and might be more creative that we are here?

I guess the question I would have is: is it likely that a library like NewRelic would want to consume marks/measures that were created by a different app/library or just their own? Would NewRelic want to share annotations that would be useful to consume by others? (Such that a common standard us useful?)

rviscomi commented 3 months ago

cc @and-oli (maybe some overlap with DevTools extensibility)

noamr commented 3 months ago

with one of the options being to include those entries in the buffer only if they contribute to a LoAF (or a LoAF long-script).

It seems to me like you would have to buffer these at first, then clean up these marks and measures afterwards.

Right

Today, there is no limit to the number of marks/measures in the buffer, so overflow is not really a problem (unless so many are issued that we get into overall memory usage worries).

I think that means that: just adding marks/measures, leaving them there, then filtering manually, might be sufficient?

It's suboptimal in terms of both efficiency and ergonomics. By doing this we're encouraging people to potentially add a massive amount of events to the buffer and these memory (and lookup efficiency) problems can potentially explode. It also adds a lot of noise to the performance timeline and to traces.

What about just a getEntriesBy* variant that accepts both a type and also a time range (and optionally that this time range can be specified by passing another entry)?

The usage would be something like:

Create a PerformanceObserver for LoAF

From each LoAF entry observed, ask for the marks/measures that overlap its timing

See above

As per your original proposal, Related questions:

It seems not so common to me to have active PerformanceObservers for marks/measures, but if someone did have one, would we want to report these LoAF-only entries?

I think not, perhaps attach them to the loaf.scripts entry?

If we did, would we delay adding these entries into the timeline until after we know if a LoAF happened (i.e. a separate buffer)?

Yea definitely.

If we did that, would we wait only until we know a LoAF is being measured, or wait for the LoAF is done measuring (i.e. LoAF always reported first)?

Perhaps in the same callback, as in if you observe both types you'd get them concurrently? Otherwise in PerformanceScriptEntry which is perhaps more suitable.

Also also, there have been requests to add groups/categories/labels to User Timings multiple times for multiple purposes. I wonder if this really needs to be LoAF specific or if it could just be a generic feature like that? Then library authors could use as they see fit and might be more creative that we are here?

I don't think this is related. One is arbitrary metadata and the other is a capturing condition.

I guess the question I would have is: is it likely that a library like NewRelic would want to consume marks/measures that were created by a different app/library or just their own? Would NewRelic want to share annotations that would be useful to consume by others? (Such that a common standard us useful?)

I feel that those are parallel question. NewRelic can use mark/measure names or new metadata we define to filter out what they care about, but that won't help with exploding the buffer with LoAF-attribution-tracing.

An additional thing I would propose for this type of mark/measure is that you can't mark/measure it before/after the fact, as in the timestamp has to be now or at least within the current script entry.

mmocny commented 3 months ago

All of that makes sense in isolation, it just feels needlessly coupled to LoAF data.

What if I wanted to review these marks when they overlap with an Event Timing (but didn't trigger LoAF specifically because delays are from after-main-presentation)

noamr commented 3 months ago

All of that makes sense in isolation, it just feels needlessly coupled to LoAF data.

What if I wanted to review these marks when they overlap with an Event Timing (but didn't trigger LoAF specifically because delays are from after-main-presentation)

I agree, I think the API should be attachable to any platform entry type where it makes sense, not coupled with LoAF specificially, or at least written in a way that's extendable to do that.

mmocny commented 3 months ago

Can we evolve your proposal slightly to something like this, then?

performance.mark / measure merely supports buffered:false, or something like it, rather than explicitly listing the type of entry to pair with.
- This has also been independently requested.
- it seems some of the performance issues with User Timings is around the need to clone various data (like details) in order to persist on the timeline. But several use cases don't need that feature and only conditionally observe.
PerformanceObserver supports { observeUserTimings: true } or even { type: 'entry-type', observeNestedType: 'other-entry-type' }, or something like it.
- This becomes like a temporary nested observer which is only observes the nested type when the main entry type is actively being observed.
- Perhaps instead of list.getEntries() you would need to list.getNestedEntries()

(Wonder if this would be useful to e.g. measure all Event Timings and then only nested LoAF entries, and then only nested user timings of those...)

noamr commented 3 months ago

Can we evolve your proposal slightly to something like this, then?

performance.mark / measure merely supports buffered:false, or something like it, rather than explicitly listing the type of entry to pair with.

This has also been independently requested.

it seems some of the performance issues with User Timings is around the need to clone various data (like details) in order to persist on the timeline. But several use cases don't need that feature and only conditionally observe.

PerformanceObserver supports { observeUserTimings: true } or even { type: 'entry-type', observeNestedType: 'other-entry-type' }, or something like it.

This becomes like a temporary nested observer which is only observes the nested type when the main entry type is actively being observed.

Perhaps instead of list.getEntries() you would need to list.getNestedEntries()

(Wonder if this would be useful to e.g. measure all Event Timings and then only nested LoAF entries, and then only nested user timings of those...)

Riffing on that, perhaps if we add { buffered: false }, we can do the trick where if you have an observer that's registered to both LoAF (or event-timing) and user-timing, the user timing entries would be buffered to the LoAF, and the LoAF would be buffered to the event-timing entry if applicable, and the entries object in the callback would have some utility functions to correlate, and call it a day?

// in your function
performance.mark(functionName, { buffered: false });

// the observer
const observer = new Performance Observer(entries => {
  const [loaf] = entries.getEntriesByType("long-animation-frame");
  const [script] = loaf.scripts;
  const marks = entries.getOverlappingEntries(script, {type: "mark" });
});
observer.observe({type: "long-animation-frame"});
observer.observe({type: "mark" });

mmocny commented 3 months ago

I like that!

Regarding { buffered: false } I think that might have been a bad suggestion in hindsight.

You would still pay the performance cost of creating the entry for this use case, even if lifetime is short, and even if it's a convenient way to manage cleanup
By not buffering at all, it motivates developers to load PO's eagerly, which we don't want to motivate
A normal PO for just "mark" types I guess would see all the marks in real time. I know I said PO's for marks aren't as common, but that might still be undesired here.

I think we need to go back to the idea of this mark type being detached from the typical perf timeline, yet still allow limited buffering (Perhaps more constrained than typical user timings).

noamr commented 3 months ago

Reciting internal conversation with @mmocny: I feel that turning this into a generic user-timing function has enough limitations to make it suboptimal:

It's hard to do efficiently without clobbering existing receivers of user-timing or creating new GCed objects
It loses information about the function we want to trace, such as its source location.

So instead, I propose to go back to the original more narrow-scoped proposal, to annotate functions for use in LoAF:

function addEventListenerWithWrapper(event_type, internal_function) {
   const wrapped_internal_function = performance.bind(internal_function);
   addEventListener(event_type, wrapped_internal_function);
}

Where performance.bind will have the exact signature of Function.prototype.bind but would wrap the internal function in measuring monotonic timestamps (performance.now()), and if the result is >5ms this function would appear in LoAF as a long script.

mmocny commented 3 months ago

Will there be a mechanism to provide a user-defined name, or will it use function.name?

noamr commented 3 months ago

Will there be a mechanism to provide a user-defined name, or will it use function.name?

I think we can do that but then we can't use additional arguments for bind. Perhaps that's ok

noamr commented 3 months ago

Summarizing WG discussion:

There is a big ask for this, some people avoid using LoAF because of this issue.
The issue is split into two: (1) attribution: with wrapper functions it seems like "it's the wrapper's fault that this script is long" (2) diagnostics: a script entry point is not always granular enough.
In addition, people were complaining about lack of support for source-maps in LoAF and that that constraints the usefulness of source location.

For (1), we need something reliable like function wrapping, as custom marks/measures don't give source location. For (2), something like a low-overhead mark/measure that ties to particular entries is perhaps preferable as not everything is a "long function".

noamr commented 3 months ago

A shape that came to mind:

// entryTypes can be LoAF, event, measure, script
const tracing = new PerformanceTracing({threshold, entryTypes, detail})

// This adds a mark/measure that gets applied to overlapping entries
// from the given list
tracing.mark(label);
tracing.measure(label);

// This creates a bound function that reports the labeled trace,
// and also appears as a `PerformanceScriptTiming` entry in LoAF
tracing.bind(wrapped_function, label)

Performance{User|LongAnimationFrame|Script|Event}Timing.traces
    PerformanceTraceEntry (start, duration, name, detail)

The advantage of creating a PerformanceTracing object or some such in advance is that we don't incur the overhead of creating options dictionaries every time we trace.

w3c / long-animation-frames

[LoAF] Allow for annotations for long animation frame to further help debugging #3