web-platform-tests / wpt

Test suites for Web platform specs — including WHATWG, W3C, and others
https://web-platform-tests.org/
Other
4.87k stars 3.04k forks source link

Policy for "Back/Forward Cache" (aka "bfcache" or "Page Cache") #16359

Open jugglinmike opened 5 years ago

jugglinmike commented 5 years ago

Firefox and Safari implement a cache which drastically alters the behavior of history.back and history.forward. Chrome is experimenting with a similar feature.

The optimized behavior is at odds with at least one test in WPT: websockets/unload-a-document/002.html (latest results from wpt.fyi). It's difficult to say how many more tests are affected by this. There appears to be just 80 references to history.back and history.forward, and that may be a fair upper bound.

Judging only from the issues where "bfcache" has been mentioned, it seems as though we're willing to accept tests concerning this behavior as long as they do not fail in non-supporting browsers. I can't find any discussion about the policy itself, hence this issue.

It doesn't seem like we can support this optionally since contributors need to know what will happen if they invoke history.back in a test.

If we assume it is present, we'll need to update the relevant tests and instruct authors to circumvent the behavior when it's unwanted. If we assume it is absent, we should include instructions/automation on disabling it whenever possible. In either case, we should make mention of this in the documentation on the infrastructure's assumptions.

jugglinmike commented 5 years ago

I'd prefer to assume absence because the feature isn't standardized, but I don't know if/how it can be disabled in the browsers we support.

gsnedders commented 5 years ago

In principle, the bfcache is just an optimisation that's only sorta observable. I think @annevk was planning on updating how HTML sorta acknowledges the existence of bfcaches in plenty of browsers to actually be more accurate (because with Blink implementing a bfcache we'll then have every major browser having one).

We'll want to run some tests with the bfcache enabled (though it's a bit awkward, like testing any caching behaviour, given eviction is implementation defined). Are there any we want to run with it disabled? Testing behaviour that isn't actually in the shipped configuration isn't that useful.

annevk commented 5 years ago

It is standardized, no? It's the pagehide / pageshow events, the salvageable state on documents, session history entries being able to point directly to Document objects, the fully active concept, etc. It might not be great, but it's better than a lot of things.

jugglinmike commented 5 years ago

In principle, the bfcache is just an optimisation that's only sorta observable.

It's hard to discuss observability in relative terms. More concretely: the feature directly undermines the expectations of the test I referenced above. That test sets window.location to trigger navigation to another document. The new document invokes history.back() and expects the following:

These may not happen if the UA is using a bfcache. Safari 12.0 fails for this reason.

My first suggestion is that we document an assumption: "history.back() and history.forward() do not trigger a navigation" and update tests that assume otherwise to use alternate methods (e.g. setting window.location).

My second suggestion is that we document as assumption: "The browser does not implement a so-called 'bfcache'" and take steps to ensure it is disabled in each browser.

Although I initially favored the second suggestion, @annevk's input that there is spec prose governing the behavior makes the first suggestion seem more appropriate.

annevk commented 5 years ago

I don't understand what documenting an assumption means. history.back() may trigger a navigation if there's no Document object on the prior session history entry, as per https://html.spec.whatwg.org/multipage/browsing-the-web.html#traverse-the-history.

jugglinmike commented 5 years ago

@annevk I'm referring to the "Assumptions" document in the project's guide for writing tests. For the purposes of consistency, I think we have to instruct test authors to assume that navigation will not occur because the presence of the Document in the session history entry (and thus the possibility of a navigation) is at the discretion of the UA https://html.spec.whatwg.org/multipage/history.html#the-session-history-of-browsing-contexts

annevk commented 5 years ago

That document reads more to me like "expected configuration", not "assumptions". Whether back() navigates or swaps with an existing Document object is quite different from that. It's much more of a cache thing which we shouldn't expect anything of in general, but might be okay within a set of delineated tests.

jugglinmike commented 5 years ago

@annevk Essentially, I want to update the referenced test to set window.location instead of invoking history.back. That alone will make the test behave consistently in Safari, but future tests could just as easily make the same mistake.

The reason I've raised this in terms of WPT policy is that the flaky test seems like a symptom of a larger problem: the bfcache behavior can subvert the expectations of test authors. I agree that authors should not assume a cache is in use. They also shouldn't assume a cache isn't in use. The recommendation I want to make is "don't use history.back or history.forward for navigation because we can make no guarantees about how the browser will respond."

Maybe the "Assumptions" document isn't the best place to put this, but I still feel that this needs to be said somewhere. Simply fixing the test seems inconsistent--we'd be acknowledging the use of those APIs as a bad practice without telling other folks to avoid it.

annevk commented 5 years ago

So looking at the test I think it's actually testing bfcache behavior (it's affected by WebSocket), so I'd be opposed to changing it. This might show a bug in Safari.

jugglinmike commented 5 years ago

Can you elaborate on what text you feel the test is verifying?

annevk commented 5 years ago

Well, that WebSocket causes unsalvageable to be set and that therefore going back cannot ever go to a cached ddocument.

jugglinmike commented 5 years ago

Ahah, the unloading document cleanup steps:

  1. For each WebSocket object webSocket whose relevant global object is window, make disappear webSocket.

    If this affected any WebSocket objects, then set document's salvageable state to false.

However, in that test, the document is being unloaded from the WebSocket's close event handler, which is nested in the open handler. In other words, the WebSocket connection has been established and the closing handshake has been started, so "make disappear a WebSocket" does nothing.

Since the salvageable state is not set to false, the presence of the Document in the session history remains at the discretion of the UA. If that's right, then Safari's flaky behavior is permissible.

annevk commented 5 years ago

Interesting, though that would mean it's showing a Firefox bug as for some reason Firefox does seem to set salvageable to false, right?

jugglinmike commented 5 years ago

Well, Firefox does seem to discard the document, but I don't know if that's because it has incorrectly set salvageable to false. Isn't it also possible that it has removed the document from the session history (or informally, "cleared the bf cache")? Would it be violating any spec text for doing that?

annevk commented 5 years ago

Perhaps, but it's suspect that it always does so and never fails the test. That makes it much more deterministic than it ought to be, no?

jugglinmike commented 5 years ago

Definitely hinky. I've reported it on Mozilla's bug tracker, but the validity of this test is still questionable. As long as the BF cache might be active, I don't think we can assert any specific behavior.

annevk commented 5 years ago

Well, we want to test salvageable given it's going to be implemented everywhere and given that I suspect we actually want to assume that it works within WPT. And I wouldn't really want to rewrite existing tests that might be testing it so they're no longer testing it (and might not be testing anything relevant anymore).

jugglinmike commented 5 years ago

Although I'd be reluctant to maintain tests that are expected to be flaky in conforming runtimes, that seems like a theoretical risk at this point. Safari behaves inconsistently on this test, but the reason is out of scope of this test and possibly of the bfcache. The problem is that sessionStorage occasionally resolves to null following history.back(), even when there is no WebSocket in play (other global APIs are also affected).

I've created a generic test case for this and submitted via gh-16478. Would you mind taking a look?

zcorpan commented 5 years ago

These may not happen if the UA is using a bfcache.

Then the implementation is not conforming to the spec. The spec requires to set the salvageable state to false on the document on navigation if it has any open WebSockets.

https://html.spec.whatwg.org/multipage/browsing-the-web.html#unloading-documents:make-disappear

This is exactly what the test is intending to test.

(Edit: I see now that this was already pointed out earlier.)

zcorpan commented 5 years ago

No, wait,

I'm not sure how to make it not flaky, short of running the same test several times and asserting that it got the "bfcache" behavior at least once.

jugglinmike commented 5 years ago

See also https://github.com/whatwg/html/issues/1931

fergald commented 3 years ago

In chromium we would like to start writing more WPT tests for bfcache. We had to fix a few of our own internal web tests. It looks like none that needed fixing were WPT. We also expect some of these tests to pass/fail depending on whether bfcache is enabled or not. I don't see a way to reliably test without a way to specify that a test needs

I can still imagine a situation where because of implementation differences, "aggressively use bfcache" could still cause problems since it has no definition but would still be better than the current state of not being able to test at all.

The alternative of writing tests to detect whether bfcaching has happened and pass in either case is bad. While it makes everything green, you have no guarantee that you tested anything.

Are there any other alternatives?

jgraham commented 3 years ago

So we could add some mechaism for wpt to opt in to different bfcache behaviours. The most obvious way would be to provide a testdriver API to change the bfcache behaviour for a particular browsing context (or, less ideally, for the whole browser). That depends on this being something we can opt into at runtime. An alternative would be something like a meta element that could map down onto a pref set at browser start time. If there's something that sounds good to implementors here I can help write an RFC and get it implemented in wpt.

smaug---- commented 3 years ago

Explicitly opt-in to always enable bfcache wouldn't work in cases where one wants to test whether use of some feature disables bfcache. (implementation might not just work correctly if certain feature is used while the page is in bfcache). And those features unfortunately vary between browsers, and I expect also between browser versions. (Some new API implementation might initially require that bfcache gets disabled, but then later that particular implementation is improved and can be used with bfcache.)

But maybe we could start with something simple. A way to reliably disable bfcache - that would be the first step. And all the implementations supporting bfcache should have some common cases when bfcache is enabled (don't use *unload event listeners, don't use no-store, use noopener, be the only top level browsing context in the browsing context group)

fergald commented 3 years ago

@smaug----

Explicitly opt-in to always enable bfcache wouldn't work in cases where one wants to test whether use of some feature disables bfcache. (implementation might not just work correctly if certain feature is used while the page is in bfcache). And those features unfortunately vary between browsers, and I expect also between browser versions.

I'm imagining e.g. a test that ensures that e.g. we don't expose location sensors while in BFCache. A browser could just disable BFCache if location sensors are used and so the test would never be able to trigger bfcaching. It seems like PRECONDITION_FAILED would be the right outcome for that test.

Being able to reliably disable would definitely be a start. https://github.com/whatwg/html/issues/5744 already exists but maybe using a header is not suitable and a test-only JS API would be better.

jugglinmike commented 3 years ago

I understand how an opt-out could be useful for web application developers. They can't make any assumptions about the contents of their users' history, so the history API is the only way they can reliably revisit previously-viewed documents.

Is that relevant in WPT, though? WPT test authors have total knowledge of the navigation history, so they can always trigger the desired navigation with another mechanism (e.g. the location API). If the bfcache would interfere with their test, can't they circumvent it by choosing some other mechanism?

If so, it seems like we could address the problem in WPT with documentation only. For example:

Do not use the history API unless explicitly testing that feature. The user-agent's behavior for the history API is intentionally underspecified in ways that may interfere with other conformance tests.

fergald commented 3 years ago

There are tests that use the history API that will pass or fail depending on whether BFCache is enabled. E.g.

https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/web_tests/http/tests/history/cross-origin-redirect-on-back.html

(and more here

I don't know if this is not tested in WPT for a good reason or if it's just something that we should migrate from Chromium to WPT but haven't done yet.

These cases are rare but they do exist, they mostly seem to involve the interaction between redirects/posts and the history API.

Is there another way around the problem that I'm not seeing?

gsnedders commented 3 years ago

If I'm not being silly, any test of the history API involving navigation could potentially be affected by the b/f cache, right?

It feels like there's two separate issues being discussed here:

FWIW, WebKit's tests are run with the b/f cache disabled, except for those which opt-in to it, and this at least currently includes WPT, which at least addresses the first.

As for testing b/f cache itself, there's some concern about over-testing and about ending up encoding Chrome behaviour in WPT. At least in principle, it should be conforming to always to do a full-load in a case where another browser has a b/f cache hit, even though that is in some ways an observable behavioural difference.

fergald commented 3 years ago

As for testing b/f cache itself, there's some concern about over-testing and about ending up encoding Chrome behaviour in WPT.

I don't see why that's a danger from Chrome vs any other browser (unless it's just because we are going to write the tests) or for BFCache in particular. We should only make it a WPT if it's specced behaviour rather than just Chrome behaviour. If we land tests that fail on other platforms, that's signal of some kind and it's easy to take it back if the test is wrong.

gsnedders commented 3 years ago

I don't see why that's a danger from Chrome vs any other browser (unless it's just because we are going to write the tests) or for BFCache in particular.

Mostly just the fact that the Chrome team historically have been the people upstreaming tests for largely-undefined behaviour like this, rather than anything else!

fergald commented 3 years ago

OK concrete proposal:

tests should check isPreviousPageInBFCache after navigation and return PRECONDITION_FAILED if the scenario requires bfcacheing.

Each vendor has the option to convert PRECONDITION_FAILED to a failure in their own harness for any given test if they expect bfcache to kick in for that test.

This allows us to

I think there's no need for enableBFCacheForThisPage

@rakina @domenic

smaug---- commented 3 years ago

isPreviousPageInBFCache is possibly a bit confusing when the previous page may be evicted from bfcache after some timeout. Or would the value of it possibly change later?

fergald commented 3 years ago

In our testing, when bfcache is enabled, we disable the timeout ensure consistency. Even if you're running tests that are not specifically bfcache-related, I think it's desirable to disable the timeout.

jgraham commented 3 years ago

So I don't think it's possible for vendors to convert PRECONDITION_FAILED into failure for specific tests, or at least the infrastructure for saying "this precondition ought not to fail in my implementation" doesn't exist and seems hard to maintain.

AFAICT the situation with bfcache is:

So whilst I understand the desire to treat this as an optimisation and not over-specify behaviour, I'm not convinced that's in the best interests of the platform; it seems like there should just be an agreed set of rules that can be tested, even if we have to agree on some additional test-only rule about eviction policy (i.e. a way to enforce the invariant the eviction doesn't happen within a single test so any page that's in bfcache at any point during a test remains so throughout the test).

fergald commented 3 years ago

Chrome infrastructure allows us to expect a range of outcomes for a test, so that when others check in tests for features that are unsupported in chrome we can just set the expectation appropriately and make it pass in the future. What do others do?

More generally, if you cannot assert that that a tests passes for real and doesn't just give PRECONDITION_FAILED how can you know that a test is being run and not just being skipped? E.g. ​this test uses assert_implements_optional which results in PRECONDITION_FAILED. Are you saying I could break the code that calculates support scaling, causing it to result in PRECONDITION_FAILED all of the time and nobody would notice?

So whilst I understand the desire to treat this as an optimisation and not over-specify behaviour, I'm not convinced that's in the best interests of the platform; it seems like there should just be an agreed set of rules that can be tested, even if we have to agree on some additional test-only rule about eviction policy (i.e. a way to enforce the invariant the eviction doesn't happen within a single test so any page that's in bfcache at any point during a test remains so throughout the test).

I'm not sure how this would work. E.g. there are some sensors that should not collect data while a page is in BFCache. In Chrome's impl we do not cache if they are enabled (we will eventually make it so they stop collecting and things work but we have a long list of things to fix). So now imagine a test to ensure that no sensor data is collected while in BFCache. In Chrome, if we just force the page into the cache without fixing the sensor code, we will collect the data and the test will see that data was collected and fail but the test is a privacy test and Chrome's privacy is correct.

Do you have examples of a test you could write using your proposal?

jgraham commented 3 years ago

My point about PRECONDITION_FAILED is that although we can set a test to expect PRECONDITION_FAILED and so notice if the test later becomes some other status, including PASS, there isn't a good way to know if a precondition failed for expected reasons ("oh we aren't expecting to support this case") or for bad reasons ("we should support this case but don't because of a bug"), other than manually examining each instance like you would for any other failure. So it only really works to help developers understand that what happened is expected when that manual check is trivial ("oh this is a whole optional feature we don't support") rather than the case here where getting PRECONDITION_FAILED might indicate that you're not caching a thing you expect to cache or might indicate that you made intentionally different choices.

In Chrome's impl we do not cache if they are enabled (we will eventually make it so they stop collecting and things work but we have a long list of things to fix). So now imagine a test to ensure that no sensor data is collected while in BFCache.

It seems like there are two different things here. One is a test that no sensor data is collected once you navigate away from the page. That would be the privacy test and wouldn't depend on the bfcache implementation. The other is that enabling sensors doesn't exclude a page from being put in the bfcache. Chrome today would pass the former and fail the latter. That seems totally reasonable.

The situation where we basically say "implementations can refuse to bfcache a page for any reason during tests" seems like it's going to make all the "real" tests implementation-specific, even when there's a commonly agreed set of behaviour that everyone is aiming to support and that could reasonably be tested in a cross-browser fashion.

fergald commented 3 years ago

I don't understand your reply about PRECONDITION_FAILED. We create a bfcache-gps-sensor-test we expect it to be PRECONDITION_FAILED, we check in that expectation (with a comment as to why) and we're done, it continues to yield the same result until some day when you make GPS+BFCache work, then it starts passing and you delete the unusual expectation. There's no ongoing overhead where people have to puzzle over why why it's PRECONDITION_FAILED.

That is exactly how PRECONDITION_FAILED tests work right now until someone implements the feature. It seems entirely in line with the expected usage, when a feature (e.g. bfcache-with-gps enabled) is missing, don't run the test.

Am I missing something about your process? Is it not possible to check in an expectation so that nobody ever has to think about it again unless it changes?

It seems like there are two different things here. One is a test that no sensor data is collected once you navigate away from the page. That would be the privacy test and wouldn't depend on the bfcache implementation. The other is that enabling sensors doesn't exclude a page from being put in the bfcache. Chrome today would pass the former and fail the latter. That seems totally reasonable.

It doesn't seem reasonable to me to fail a WPT if we're spec compliant.

Here's what happens in some imaginary circumstances where we have a test for bfcache+gps under both schemes

With PASS/FAIL:

In both cases, privacy problems to go undetected.

With PRECONDITION_FAILED/PASS/FAIL:

API

Finally, on the idea of an API to force caching, that seems quite problematic, we have code that evicts pages from the cache and expects them to be evicted, dropping out of IPC handlers half-way, knowing that the page is doomed. Keeping that page around and pulling it back out of cache could lead to crashes.

As far as I can see, using PRECONDITION_FAILED covers everything and has no downsides as long as everyone has the ability to check in expectations.

zcorpan commented 3 years ago

Is isPreviousPageInBFCache sufficiently expressive? Which page is the "previous" page? Considering that tests may want to exercise edge cases that are currently not interoperable, e.g. https://github.com/whatwg/html/issues/6483

Would it make sense for the API for this be in terms of https://github.com/WICG/app-history ?

cc @jakearchibald @domenic

jgraham commented 3 years ago

I agree the force caching idea doesn't work. Consider it withdrawn. But nothing I've said recently depends on that.

Fundamentally I think the question here here is "should bfcache be specced in a way that ensures UAs behave consistently, subject to reasonable limits" (c.f. implementation defined limits in the HTML spec). Since the behaviour of the feature is observable from content, the obvious answer is "yes". But since it's regarded as an optimisation, the traditional answer is "no".

If we don't spec the feature in enough detail that we can agree what tests should do, it seems unlikely that we're going to end up with a useful set of shared tests. Implementors will end up writing product-specific tests that require whatever behaviour they happen to implement and largely ignore the "shared" tests which will most likely just reflect the decisions of whichever implementor happens to write them. Whether the status in the case of fundamental disagreement over the behaviour of the feature is PASS, FAIL or PRECONDITION_FAILED doesn't change much here.

fergald commented 3 years ago

I agree with the goal of sharing tests even on optional behaviour and PRECONDITION_FAILED is exactly how to do that.

1 navigate forward 2 assert_implements_optional(previousPageIsInBFCache(), "BFCache is not supported") 3 navigate back 4 assert(pageshow.persisted)

1 enable feature 2 navigate forward 3 assert_implements_optional(previousPageIsInBFCache(), "BFCache is not supported with $feature enabled") 4 navigate back 5 assert(pageshow.persisted) 6 assert that feature didn't do anything terrible while in BFCache (maybe there's nothing to test here for some features)

I don't see what goes wrong with the above and I don't see how it interferes with fully speccing BFCache or with having a battery of shared tests.

jgraham commented 3 years ago

I agree that if the constraints are a) it must never be required by the specification to put a page in the bfcache (even under test conditions) and b) we must never display FAIL for something that's technically allowed, your solution satisifes the constraints and will allow producing a shared testsuite. I just also believe that the social dynamics of accpeting those constraints will tend to cause people to rely more on browser-specific tests that encode the actual rules they implement in a clear PASS/FAIL format and pay less attention to the shared tests.

If we agreed that any bfcache implementation must follow explicit rules, with some additional constraints imposed on the test scenario specifically (e.g. rules around not having time-based eviction during a test), we'd end up with something more testable and more consistent between browsers; thus helping authors. We could make the tests clearly PASS/FAIL and a browser with no implementation at all would simply accept that they FAIL these tests. That's not so unusual; e.g. there are Blink-only features for which other browsers FAIL the tests not because the implementation is wrong, but because they object in principle to the existence of the feature.

It's possible that I'm wrong about this of course, but in general getting people to pay attention to test failures from tests they didn't write is even harder than getting them to write tests in the first place. It would be useful to get some input from @smaug---- and whoevr works on this on the WebKit side on what would be most likely to make the testsuite useful for their implementation work.

gsnedders commented 3 years ago

cc @cdumez

hiroshige-g commented 3 years ago

I created a PR https://github.com/web-platform-tests/wpt/pull/28950 that shows a basic framework for BFCache WPTs (especially for simple scenarios). What do you think? If there are no objections, I'd like to merge the PR and upload other WPT PRs for more individual BFCache-related issues based on the PR.

jgraham commented 3 years ago

I'm very happy to see progress here.

Since the PR adds APIs to testharness.js I think it needs to go through the RFC process to ensure that we get cross-vendor feedback on the API design.

domenic commented 3 years ago

I'm a bit disappointed by the tests since they don't guarantee that the page gets put into bfcache (e.g. using testdriver or something similar to force it). So I'm not sure how I would ever run the tests locally to debug bfcache; I expect I would get "precondition failed" most of the time. I would just have to hope that the browser decides to cache, I guess, and try to do all my debugging in those runs?

fergald commented 3 years ago

What debugging are you thinking about? Why do you expect to get precondition failed most of the time? We have web tests in chrome that expect bfcache to work every time and to my knowledge they are not flaky.

If we force caching when the browser has a reason to not cache then we're asking for browser crashes.

domenic commented 3 years ago

Well, for example, if I'm writing a test to see what happens while in bfcache (to test, e.g., a fully active condition), it's not very useful to me if sometimes I'm in bfcache and sometimes I'm not. That just makes the debugging experience very frustrating.

If bfcache works every time, then maybe the tests can assert that, instead of using the precondition failed system.

rakina commented 3 years ago

@domenic, are you concerned that browsers will most likely not cache the page when running web tests? I guess adding enableBFCacheForThisPage as a way to explicitly say to the browser that we'd prefer bfcache to be used wouldn't hurt. It's still possible for the browser to not cache and cause the test to do a PRECONDITION_FAILED, but hopefully less so. Do you think that will help?

@jgraham thank you! So I guess now @hiroshige-g just needs to make a RFC explaining create_remote_prefixed_local_storage and fetch_tests_from_prefixed_local_storage (and maybe the additions to PrefixedLocalStorage.js?)

domenic commented 3 years ago

are you concerned that browsers will most likely not cache the page when running web tests?

"Most likely" is overstating it. It's more that I have no confidence as to whether they will or not; it's not that I have confidence they usually won't. Maybe they cache when my system has plenty of extra memory? Maybe they cache based on previously observed traffic patterns? It'd be a nightmare to debug such situations...

Do you think that will help?

Yeah, I think so. It'd give me some reassurance that, if a browser has the capability to bfcache the page, then it's trying to do so, and I don't have to worry about some heuristic I might be tripping the wrong direction.

fergald commented 3 years ago

"Most likely" is overstating it. It's more that I have no confidence as to whether they will or not; it's not that I have confidence they usually won't. Maybe they cache when my system has plenty of extra memory? Maybe they cache based on previously observed traffic patterns? It'd be a nightmare to debug such situations...

These are all concerns but any browser that wants to make caching non-deterministic should disable that when running WPTs. In chrome the only thing we consistently tweak like that in tests is --enable-features=BackForwardCacheNoTimeEviction but if we were to add a lot of other heuristics for not-caching then we would also turn them off when running tests. If caching fails consistently or flakily due to some non-heuristic reason then the test or the browser has a bug.

I don't think we should introduce something to say "make bfcache deterministic" at the start of every bfcache test, that should just be the default for all WPT tests.

One problem is that there is no agreed cache-size for WPT tests. We should probably try to agree on something so that nobody writes tests that are doomed to hit PRECONDITION_FAILED for that reason.