Problematic cases with cross-origin iframes & aborted navigations

noamr commented 1 year ago

In most cases a single RT entry corresponds to a single completed fetch (successful or non-aborted network error). However, in the case of iframes this is a bit murky, especially when the iframes are cross-origin.

Imagine the following scenario (as represented in this test):

embedder.com embeds crossorigin.com/iframe.html
Before iframe.html's body fully loads, it navigates away to other.com/other-iframe.html - either by user interaction or by e.g. location.href setting in the head. (there are many ways in which this can happen)
The iframe successfully loads other.com/other-iframe.html
What should the resource timing entry correspond to?

Currently the resource timing entry (at least in chrome & Safari) would be of crossorigin.com, representing whatever state the response was when it was aborted. The entry would only be reported when the body of other.com is complete. According to spec perhaps it shouldn't be reported at all since it's an aborted fetch.

Both cases could lead to confusion and would be a cross-origin violation. embedder.com is not supposed to know that the iframe has aborted and navigated to somewhere else, and both not reporting it at all or reporting the first navigation only would expose that.

One way to solve it is to align the timing with the timing of the iframe's load event, but that would be misleading - the time between navigating the iframe and the iframe's load event could be multiple fetches plus Javascript execution in the middle. Also, if we do this only for cross-origin iframes, they would diverge from what those timing values mean for same-origin iframes.

My proposal is to not report iframe navigation to resource timing at all, and to rely on navigation timing alone for this, letting the iframe explicitly send it information to the parent with postMessage if it so wishes. But I'm not sure the impact on current users of these APIs etc, so would love to hear thoughts.

noamr commented 1 year ago

/cc @yoavweiss @nicjansma @clelland @domenic @annevk @sefeng211 @mikewest

clelland commented 1 year ago

My instincts here are that we should treat the initial fetch as much as possible like any other resource. I don't think we should wait for the iframe's load event to fire, or for a manual redirect to completely load, but just report the timing of the initial main document.

Assuming that crossorigin.com/iframe.html sets TAO such that embedder.com can read it (otherwise all of this is probably unnecessary), then I would expect that we could report the start and end times like other aborted requests ("Resources for which the fetch was initiated, but was later aborted (e.g. due to a network error) are included as PerformanceResourceTiming objects in the Performance Timeline, with their start and end timing.")

You're right that that can expose some user behaviour... are there any other cases where the platforms allow a subresource request to be stopped by the user, that would expose similar timing info?

noamr commented 1 year ago

My instincts here are that we should treat the initial fetch as much as possible like any other resource. I don't think we should wait for the iframe's load event to fire, or for a manual redirect to completely load, but just report the timing of the initial main document.

Assuming that crossorigin.com/iframe.html sets TAO such that embedder.com can read it (otherwise all of this is probably unnecessary), then I would expect that we could report the start and end times like other aborted requests ("Resources for which the fetch was initiated, but was later aborted (e.g. due to a network error) are included as PerformanceResourceTiming objects in the Performance Timeline, with their start and end timing.")

But in this case it would be an aborted/terminated fetch, which would not become a RT entry. Only non-abort errors are reported. See abort a Document: all tasks queued from that fetch are to be discarded.

And what about iframes without TAO?

Perhaps a solution would be to report only IFrames with TAO (+ same-origin), and only the first navigation if it's completed, and report nothing for iframes without TAO.

You're right that that can expose some user behaviour... are there any other cases where the platforms allow a subresource request to be stopped by the user, that would expose similar timing info?

AFAIK only iframes allow users autonomous interaction in a cross-origin context.

domenic commented 1 year ago

I think if the user presses the stop button in the middle of a slow-loading cross-origin image load, then that image will get aborted? I'm not sure what event fires or what happens with resource timing in that case.

noamr commented 1 year ago

I think if the user presses the stop button in the middle of a slow-loading cross-origin image load, then that image will get aborted? I'm not sure what event fires or what happens with resource timing in that case.

According to spec there shouldn't be an RT entry. But in any case, stopping the load is an action in the embedding document, unlike an iframe abort due to internal navigation before body complete.

sefeng211 commented 1 year ago

So..what if we just do what the current spec expects? Like we don't expose the iframe request if it's aborted and if it doesn't, we expose it along with TAO checks.

noamr commented 1 year ago

So..what if we just do what the current spec expects? Like we don't expose the iframe request if it's aborted and if it doesn't, we expose it along with TAO checks.

Then we expose that the abort happened, which exposes something about how the user interacted with a cross-origin iframe or about its content.

noamr commented 1 year ago

A possible way forward that doesn't involve shutting down iframe reporting, a take on what @clelland had suggested:

For same-origin or TAO-pass, do what the spec currently says. TAO-pass iframes don't report early aborts, which exposes the fact that there was indeed an abort. change implementations to match the spec.
For cross-origin iframes with TAO-fail, report the time between navigating and the first load event. This would at least show that there was an iframe but the timing would only match a fetch in the simple case, which is probably the most common.

yoavweiss commented 1 year ago

This seems like a reasonable way forward, with minimal compat implications. I like it!

noamr commented 1 year ago

An additional idea I had while on leave:

All subframe navigations are navigation timing entries, but perhaps with a different "type" - e.g. a PerformanceSubframeTiming : PerformanceNavigationTiming with type="subframe", you'd have to observe/get them explicitly.
The main difference between subframe and navigation entries would be the meaning of startTime (and thus duration). startTime for subframe entries would be the time the container initiated the navigation - which unlike normal navigation timing, can be before redirectStart. This would capture the client-side redirect time as the gap between startTime and redirectStart, and align with how this would work for cross-origin TAO-fail iframes.
Iframes/objects with TAO enabled/same origin would expose the whole PerformanceNavigationTIming set of values
cross-origin iframes/objects without TAO would expose only startTime and duration, which would be equivalent to the iframe load event..
The entries would only be queued upon full load.

Pros:

would not silently change the meaning of duration for a PerformanceResourceTiming entry
Would be consistent across cross/same origin - same meanings, but some attributes hidden
Requires very little special-casing in implementation - goes via the navigation-timing code paths.

Cons:

Existing code that expects iframes as resource timing entries would have to be modified.

WDYT? @yoavweiss @clelland @nicjansma

noamr commented 1 year ago

An additional idea I had while on leave:

All subframe navigations are navigation timing entries, but perhaps with a different "type" - e.g. a PerformanceSubframeTiming : PerformanceNavigationTiming with type="subframe", you'd have to observe/get them explicitly.

The main difference between subframe and navigation entries would be the meaning of startTime (and thus duration). startTime for subframe entries would be the time the container initiated the navigation - which unlike normal navigation timing, can be before redirectStart. This would capture the client-side redirect time as the gap between startTime and redirectStart, and align with how this would work for cross-origin TAO-fail iframes.

Iframes/objects with TAO enabled/same origin would expose the whole PerformanceNavigationTIming set of values

cross-origin iframes/objects without TAO would expose only startTime and duration, which would be equivalent to the iframe load event..

The entries would only be queued upon full load.

Pros:

would not silently change the meaning of duration for a PerformanceResourceTiming entry

Would be consistent across cross/same origin - same meanings, but some attributes hidden

Requires very little special-casing in implementation - goes via the navigation-timing code paths.

Cons:

Existing code that expects iframes as resource timing entries would have to be modified.

WDYT? @yoavweiss @clelland @nicjansma

@clelland? Would love to see how this jives with the new frame-reporting thing.

nicjansma commented 1 year ago

If I'm understanding the proposal(s) correctly, I think I'd prefer the Aug10 one which is to change the behavior of the RT entries vs. the Oct11 one which would remove XO-TAO-fail IFRAMEs from RT in favor of introducing the new entry type.

If we continue to use RT entries none of the existing RUM scripts have to adjust their RT gathering logic (e.g. crawling frames, calling getEntriesByType or a PO w/ buffered:true).

If we change to a new PerformanceSubframeTiming type, all RUM scripts would have to adjust or their "visibility" into the IFRAME existing will break.

It seems reasonable to me to stop reporting of RT entries for IFRAMEs that abort, and for X-O IFRAMEs to be navigation to first load event.

annevk commented 1 year ago

My instincts here are that we should treat the initial fetch as much as possible like any other resource. I don't think we should wait for the iframe's load event to fire, or for a manual redirect to completely load, but just report the timing of the initial main document.

Isn't this a new timing channel?

noamr commented 1 year ago

My instincts here are that we should treat the initial fetch as much as possible like any other resource. I don't think we should wait for the iframe's load event to fire, or for a manual redirect to completely load, but just report the timing of the initial main document.

Isn't this a new timing channel?

Yes, hence the proposals here to make navigation responseEnd TAO protected, and in the TAO-fail cases fall back to frame load time as the duration, which is already exposed.

noamr commented 1 year ago

I drafted PRs to the HTML and fetch specs that implement this proposal

https://github.com/whatwg/fetch/pull/1579 https://github.com/whatwg/html/pull/8643

bdekoz commented 1 year ago

I'd like to see this reviewed in committee again, I'm not quite sure there is consensus here.

yoavweiss commented 1 year ago

This was discussed at TPAC, and there was strong preference towards the option that @noamr is now pushing. Are there any particular reasons to re-open that decision?

w3c / resource-timing

Problematic cases with cross-origin iframes & aborted navigations #340