whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.18k stars 2.69k forks source link

Exposing back/forward cache blocking reasons to sites #7094

Open rubberyuzu opened 3 years ago

rubberyuzu commented 3 years ago

Currently developers can tell whether BFCache is being used or not in the wild but they cannot tell what reasons are blocking it from being used and what actions to take to improve their hit-rate.

We (Chromium) would like to make it possible for sites to collect information on why back/forward cache was not used on a history navigation.

One possibility would be to implement a reporting mechanism in Reporting API that sends the items that blocked back/forward cache.

Another possibility would be to make it available through a JavaScript API, e.g. when pageshow is not persisted but is a history navigation, it could contain information about why it was not persisted. Or it could be available from some other API. This would explicitly expose the fact that this was a history navigation.

For both of these we would probably want to standardise some of the reasons (where they are common and part of the spec) but also allow vendor-specific reasons for cases where blocking was not required by spec but happened anyway.

One side benefit we could potentially get from this is that we would be able to write web platform tests for why back/forward cache is blocked.

cc @clelland @annevk @smaug---- @mystor @cdumez @beidson @hober @altimin @xharaken @fergald @domenic

smaug---- commented 3 years ago

Exposing the information to JS sounds reasonable (for example as a part of pagehide event). I wonder what the API could look like, so that it could capture various reasons. And need to be careful to not expose cross-origin information.

fergald commented 3 years ago

And need to be careful to not expose cross-origin information.

Yes @clelland pointed out that we could leak the fact that some subframe is using an unload handler or GPS or whatever. You wouldn't know which frame, just that some frame has it but I wonder 1 when would this be information that was not known to the sites developers already? E.g. if google ads iframe uses an unload handler, you don't need a new API to find that out. 2 what could someone do with this info? Of course people find unanticipated ways to exploit all kinds of stuff

A few other ideas

annevk commented 3 years ago

If a user navigates a child frame you might learn things about the user's navigation habbits on other origins. I don't think we can expose anything that goes across the origin boundary.

fergald commented 3 years ago

Yeah, fair enough. So I think the most info we could possibly offer would be to have per-frame a struct with

and then make this struct available to the top-level frame for a history navigation that is not cached.

rubberyuzu commented 3 years ago

Created an explainer here.

rubberyuzu commented 3 years ago

Do you think this proposal clears the bar of not exposing cross-origin information? I wanted to make sure.

@annevk @altimin

annevk commented 3 years ago

@rubberyuzu I agree with the general sentiment of that document (thanks for writing it!), but what happens in these scenarios:

  1. A1 embeds B and that embeds A2 and C?
  2. A1 embeds B and the user navigates B to A2?
  3. A1 embeds B and the user navigates B to C?
rubberyuzu commented 3 years ago

That's a good point. I think the most conservative way would be to...

So, in the following scenario:

  1. A1 embeds B and that embeds A2 and C?

Report reasons of A1, and mask B's subtree (only report if B's subtree is blocking BFCache or not)

{
  URL:”a.com”, /*A1*/ 
  Id: “x”,
  blocked: False,
  reasons:[],
  children: [
    {src:”b.com”, id: "y", blocked: False, reasons:[], children: []}, /*B and its subtree*/
  ]
}
  1. A1 embeds B and the user navigates B to A2?

Report reasons of A1 and A2.

{
  URL:”a.com”, /*A1*/ 
  Id: “x”,
  blocked: False,
  reasons:[],
  children: [
    {  URL:”a.com” , Id: “x”,  blocked: False, reasons:[],}, /*(B->)A2*/
  ]
}
  1. A1 embeds B and the user navigates B to C?

Report reasons of A1, and treat the subframe as cross-origin. For cross-origin iframes, we only report "src" instead of the current URL (report B instead of C as src URL).

{
  URL:”a.com”, /*A1*/ 
  Id: “x”,
  blocked: False,
  reasons:[],
  children: [
    {src:”b.com”, id: "y", blocked: False, reasons:[], children: []}, /*(B->)C*/
  ]
}

Please let me know what you think!

annevk commented 3 years ago

I wonder if we should separate "src" and "location" into separate fields where "location" ends up blank for everything that is cross-origin. That might be more useful for developers and I think it would end up exposing the same amount of information.

Otherwise that looks reasonable to me.

Are we using "same origin" or "same origin-domain" by the way? I assume the former given the plan to remove document.domain at some point in the future?

fergald commented 3 years ago

The explainer needs a bit of an update. Instead of using a dictionary, it should be presenting as a tree of JS objects. The fields would be the same for cross- and same-origin but some null or empty for cross-. So src and url (or location might be a better name) should be separate fields and url would only be populated for same-site.

My assumption was that if the parent frame can access the child frame and script it, then it should be allowed to know what blocked it. I'm not sure what is the best way to express that. If document.domain is being removed then not including it in this meaning would be fine if that's how new features are doing it.

annevk commented 3 years ago

That would be "same origin-domain", but in general we try not to use it for new features. Note that it's also stricter than that due to excluding the subtree of cross-origin documents (which are currently visible to some extent and especially if anything in that subtree is same origin). To be clear, I think that is the correct decision.

camillelamy commented 2 years ago

We looked at the proposal in Chrome Security, and we were wondering if any kind of reporting for cross-origin iframes is not an XSite leak, even if we do not send a reason. For example, consider an iframe which has an unload handler if the user is signed in, and doesn't have it if they are not. Just knowing that the iframe blocked the page from going into bfcache might be enough to know whether the user was signed-in in the cross-origin iframe or not.

To take a similar example, in the COOP reporting API, where iframe actions would be of interest to the top-level page, we have chosen not to report any information from the cross-origin iframe. So we are wondering whether the safe choice here is not to report anything at all for cross-origin iframes.

domenic commented 2 years ago

Just knowing that the iframe blocked the page from going into bfcache might be enough to know whether the user was signed-in in the cross-origin iframe or not.

I think this information is already available via various channels, albeit in a noisy way. For example:

I'm not sure how this impacts the overall security analysis.

fergald commented 2 years ago

@camillelamy I think it's possible to extract exactly the same signal right now. Assume a.com is trying to find out if b.com blocks BFCache.

  1. create a.com/attack with no BFCache blockers and a subframe of b.com
  2. navigate forward
  3. navigate back to
  4. if BFCache was blocked then a.com/attack knows it was b.com (there are transient reasons that can also block that make this signal unreliable but they apply with and without the new API)

The only difference is that the new API makes it possible to extract the information while including more subframes from other origins and/or using BFCache-blocking features but I'm not sure that is a material difference, an attacker could create a simple a.com/attack and quickly navigate away and back, collect the information and then present a more complex page on a.com/attack with the information in hand.

camillelamy commented 2 years ago

I see. Yeah that makes sense. And to confirm, the only cross-origin URL the page is going to see is the one it initially asked to load in the iframe (ie pre server redirects and pre subsequent navigations)?

fergald commented 2 years ago

@camillelamy Thanks. It will see the value of the iframe's src and id attributes at the time the page goes into BFCache. It will have no access to whatever actual URL is in the iframe.

camillelamy commented 2 years ago

Ok that sounds good.

domenic commented 2 years ago

I tagged this for the upcoming triage meeting, but will not be attending. I will leave the agenda+ label because I was mostly going to say "we're starting to get serious about implementing this". It'd be great if other implementers took a look at the explainer at https://github.com/rubberyuzu/bfcache-not-retored-reason/blob/main/NotRestoredReason.md. In particular I am curious for people's thoughts on https://github.com/rubberyuzu/bfcache-not-retored-reason/issues/2.

smaug---- commented 2 years ago

The proposed API doesn't seem to work too well cases when page is evicted from bfcache because of use of some API. For example Firefox let's one to have open BroadcastChannel and still bfcache the page, but if one uses the channel, then page is evicted. Chrome seems to have similar cases for example with service workers' claim() (at least based on the proposed tests in https://github.com/web-platform-tests/wpt/pull/31082#discussion_r863186013).

fergald commented 2 years ago

What's the problem with the API in that case? If the user returns to the page that eviction reason will be listed in the reasons. To be clear, the API does not tell you what is preventing the current page from being cached. It only tells you after a history navigation why the previous page was not cached.

smaug---- commented 2 years ago

How would reporting API work in that case? And using word "blocked" in the dictionary in that case is a bit confusing, when the page wasn't blocked from bfcache, it was just later evicted.

fergald commented 2 years ago

How would reporting API work in that case?

We haven't put a lot of work into the RAPI case. I guess we have a choice there

  1. only report when the user navigates in history and the page is not restored
  2. report as soon as we know the page will not be restored

While 2. gives more information, it's unclear that it's useful information as it could lead devs to focus on pages which are not cached but also not navigated back to.

Do you see a problem with 1.?

And using word "blocked" in the dictionary in that case is a bit confusing, when the page wasn't blocked from bfcache, it was just later evicted.

Fair point. We are basically exposing Chrome's internal telemetry which covers blocking reasons and also evicting-later reasons, so I agree "blocked" is not the best choice. "not-restored-reasons" is a bit of a handful. Suggestions welcome and then we can update the doc.

smaug---- commented 2 years ago

It is unclear to me what the goal is with the reporting API use here. Should (a) the server know all the cases a page is blocked from entering bfcache or being evicted from bfcache because of use of some other API or (b) just tell that page couldn't be restored when user tried to get back to it?

In (1), since implementations evict pages from bfcache because of memory pressure or timeout or whatever internal heuristics, should such case be reported?

"blocked-or-evicted" as the term might work, assuming "block" and "evict" ends up to other specs too.

rubberyuzu commented 2 years ago

I noticed something that could be a privacy risk on this API.

I wrote it in details here. Basically the concern is that we could expose 1) that extensions are installed and active on the page and 2) possibly which extensions.

One solution is that we hide all this information, masking all extension related reasons as "internal error".

rubberyuzu commented 2 years ago

It is unclear to me what the goal is with the reporting API use here. Should (a) the server know all the cases a page is blocked from entering bfcache or being evicted from bfcache because of use of some other API or (b) just tell that page couldn't be restored when user tried to get back to it?

In (1), since implementations evict pages from bfcache because of memory pressure or timeout or whatever internal heuristics, should such case be reported?

"blocked-or-evicted" as the term might work, assuming "block" and "evict" ends up to other specs too.

Sorry I missed this! It's (b) with reasons why a page was not restored from BFCache.

As for the terminolgy: Now we have "blocked" boolean, meaning whether that frame is to blame for the page not being restored. Maybe it is not accurate because the frame might have caused eviction rather than blocking the page from entering BFCache. It's hard to rename this boolean (ideas are welcome). Instead we could remove this field, and always populate reasons to suggest that the frame caused blocking/eviction. If we should mask the reasons because of cross-origin, we can add "unspecified" as a reason. WDYT?

rubberyuzu commented 2 years ago

Update on the API- We spotted a potential privacy leak in the API here: https://github.com/rubberyuzu/bfcache-not-retored-reason/blob/main/NotRestoredReason.md#single-cross-origin-iframe-vs-many-cross-origin-iframes

We decided to fix this by randomly selecting a cross-origin iframe and report.

smaug---- commented 1 year ago

We discussed about the proposal some more (at Mozilla) and we're not happy to expose any browser internal reasons. Only reasons to which the page author can somehow affect (like use of some API etc) should be exposed. And https://github.com/rubberyuzu/bfcache-not-retored-reason/issues/2 looks reasonable.

Also, the explainer has "But as per WICG discussion, Performance Navigation Timing API was more preferred, and we are not going to implement this as Pageshow API." Could you open that reasoning a bit? Which WICG? And perhaps a link to the meeting notes?

rubberyuzu commented 1 year ago

Thanks for the comment.

There were two discussions where we talked about which API we should extend to include NotRestoredReasons : TPAC minutes WebPerf minutes

We decided on NavigationTiming API instead of Pageshow because NavigationTiming API already provides information about the navigation such as navigation type being "back-forward", and adding NotRestoredReasons seemed to be natural extension of that. I'd be happy to hear your opinion about this though.

zcorpan commented 1 year ago

From https://github.com/mozilla/standards-positions/issues/766

rubberyuzu commented 1 year ago

I would like to ask your opinions about the naming of this API.

Currently the explainer calls the API "NotRestoredReasons" and PerformanceNavigationTiming's field "notRestoredReasons". Domenic pointed out that spec already defines document reactivation and Web Platform API refers to it as "pageshow with persisted = true". Thus introducing the term "restored" could lead to confusion.

Should we call this NotReactivatedReasons? Or do you have some other suggestions?

@domenic @fergald @rakina @smaug---- @annevk

rakina commented 1 year ago

I think since we refer to the "fully active" concept a lot in BFCache-related spec and documentation (e.g. https://w3ctag.github.io/bfcache-guide/), "Reactivated" seems like a more consistent choice.

domenic commented 1 year ago

I'm mostly concerned with web developer API consistency; we don't necessarily need that to be consistent with the spec language. (Although it's always nice when they can match.)

From that perspective I'm most worried about persisted. Should it be notPersistedReasons? It kind of makes sense to me: you get a pageshow event with event.persisted === false, and you check the notPersistedReasons navigation timing entry to find out why.