Open yutakahirano opened 6 years ago
@annevk @yoavweiss @igrigorik @sleevi @davidben @wanderview @youennf @toddreifsteck
KeepaliveContext is just a concept, not an API, right?
KeepaliveContext is just a concept, not an API, right?
It's just a concept.
Does anyone have any feedback?
Does anyone have any feedback?
So, from Chrome's perspective, a KeepAliveContext would replace the per-renderer restrictions?
Does this mean doing away with the 30 second timeout in Chrome? Could we at least apply an equivalent timeout to a ServiceWorker, so a page can't create a keepalive request that lives longer than a ServiceWorker could perform the equivalent fetch?
I also like the "keepalive does nothing in ServiceWorkers" approach, although it might confuse developers. Presumably if a keepalive is forwarded through a ServiceWorker it gets added to the KeepAliveContext for the page it came from?
The number of request per process might be difficult to handle for developers and might never be interoperable anyway. Number of requests per browser might be interoperable but one can argue that the actual limitation might depend on the actual configuration of the device.
It makes sense to me to limit keep alive requests according their clients/workers origin and not in terms of context or process. Clients/workers can coordinate if they do not want to hit the limit. For WebKit, the context could be double-keyed for privacy reasons.
For keep alive requests, WebKit is defining a long timeout after which request is aborted. We could try to be consistent across browsers.
Or we could have no timeout at all for these requests as long as the context of the request exists. We would then use an abort timeout when the context of the keep alive request is being destroyed. That way, a user could expect that closing a tab will eventually kill the related keep alive requests after some grace period. Not sure the timeout actually needs to be specified.
FWIW, this timeout-on-destruction is close to how WebKit is handling service workers. When there is no more clients that a service worker can talk to, the service workers get killed after a grace period.
So, from Chrome's perspective, a KeepAliveContext would replace the per-renderer restrictions?
Yes.
Does this mean doing away with the 30 second timeout in Chrome? Could we at least apply an equivalent timeout to a ServiceWorker, so a page can't create a keepalive request that lives longer than a ServiceWorker could perform the equivalent fetch?
No, it should be covered by "Cancel a request with keepalive set when a certain time period passed from the fetch group termination."
I also like the "keepalive does nothing in ServiceWorkers" approach, although it might confuse developers. Presumably if a keepalive is forwarded through a ServiceWorker it gets added to the KeepAliveContext for the page it came from?
Could you tell me why it's confusing? Your proposal is very interesting. I'll add it to the options.
The number of request per process might be difficult to handle for developers and might never be interoperable anyway.
Agreed.
Number of requests per browser might be interoperable but one can argue that the actual limitation might depend on the actual configuration of the device.
I'd add one more reason against this option: giving an ability to exhaust entire limit to each evil developer is not desirable.
It makes sense to me to limit keep alive requests according their clients/workers origin and not in terms of context or process. Clients/workers can coordinate if they do not want to hit the limit. For WebKit, the context could be double-keyed for privacy reasons.
It's very easy to create a cross-origin iframe (e.g., www.example.com and www2.example.com) and that's why I'd like to use a top-level frame as a basic unit instead of origin.
For keep alive requests, WebKit is defining a long timeout after which request is aborted. We could try to be consistent across browsers.
Thank you, that's really encouraging.
Or we could have no timeout at all for these requests as long as the context of the request exists. We would then use an abort timeout when the context of the keep alive request is being destroyed. That way, a user could expect that closing a tab will eventually kill the related keep alive requests after some grace period. Not sure the timeout actually needs to be specified.
That's what Chrome is doing. I'm not sure if the specific timeout value should be specified either, but describing a mechanism will be good.
FWIW, this timeout-on-destruction is close to how WebKit is handling service workers. When there is no more clients that a service worker can talk to, the service workers get killed after a grace period.
I also like the "keepalive does nothing in ServiceWorkers" approach, although it might confuse developers.
Could you tell me why it's confusing?
I'm concerned that if it appears to work but sometimes doesn't, developers might be surprised.
However, I haven't come up with any case where this really matters. I don't think we need to worry about it.
I proposed a session for this topic at https://www.w3.org/wiki/TPAC/2018/SessionIdeas#Session:_restriction_for_fetch_keepalive. I'm looking forward to see you there.
(we talked about this offline but I wanted to have a record) It would not be good for keepalive to have no effect in service workers, especially for requests that are initiated by the worker. See https://github.com/w3c/ServiceWorker/issues/1336 for a use case.
In the notes from https://github.com/whatwg/fetch/issues/679#issuecomment-432704656 I believe the summary wrt to service workers is:
Based on what we discussed at TPAC, I update the proposal as following:
Makes sense?
I think this captures where we want to go.
We should be able to update the spec so that:
I am hesitant to describe more than that given that:
https://github.com/whatwg/fetch/issues/679#issuecomment-433827856 makes sense to me to implement. Like Youenn, I'm not quite convinced the specification needs this level of detail, but I'll leave it for you.
The updated proposal doesn't seem to include the 30 second timeout. Is it being removed?
By the way, don't these restrictions conflict with the current Beacon spec? Beacon is specified using keepalive and says:
[Other than payload size], the sendBeacon() API is not subject to any additional restrictions. The user agent ought not skip or throttle processing of sendBeacon() calls, as they can contain critical application state, events, and analytics data.
@youennf I think it's valuable to have a policy in the spec, as said in the original proposal.
I think having interoperability here is particularly good from two reasons.
One is to have a unified policy. Given the leaky nature of the flag, each implementer has to make a balanced decision between developers’ convenience and users’ expectation. Having diverse policies will confuse both developers and users.
The other is the difficulty to handle errors. keepalive flag is expected to be used when the page is about to unload. In such a circumstance developers are not likely to be able to detect and handle errors correctly. Having a rigid, interoperable restrictions will be developers’ benefit.
I think it's good to leave actual numbers hand-wavy. We can replace 9 in https://github.com/whatwg/fetch/issues/679#issuecomment-433827856 with something like:
When a fetch group is terminated, for each request r with keepalive set:
A side note: if we decide to remove the 64KB restriction for attached keep alive requests, we might still want to warn developers that long keep alive uploads have higher chance to fail. Hence advise to keep these requests as small as possible.
Hello @yutakahirano, @mfalken
We’re excited to share that Firefox is rolling out a new feature soon, and we’re aiming to include similar resource restrictions.
- If the renderer process is processing more than 9 requests with keepalive set, we reject a new request with keepalive set initiated by fetch().
- If the renderer process is processing more than 19 requests with keepalive set, we reject a new request with keepalive set.
Note: The difference between fetch() + keepalive vs. SendBeacon is from a historical reason and we are going to have a unified restriction.
- If Chrome is processing more than 255 requests with keepalive set, we reject a new request with keepalive set.
Could you let me know if these restrictions have changed? While we’re taking a slightly different route by limiting them based on origin and restricting requests per browser instance, we want to keep our limits comparable to Chrome’s.
By the way, what’s the plan for addressing the spec issue?
Thanks a million!
Neither @yutakahirano nor @mfalken work on Chrome any more. Pinging @mingyc who has been touching keepalive as part of them deferred fetch work.
The current keepaive-specific limits have been updated for a while:
2048
256
512 * 1024
.The Chromium source is here.
See also: #662
Problem
We, Chrome, shipped the keepalive flag in fetch API, with a non-interoperable restrictions:
These restrictions are very much visible to web developers (making 10 fetch requests is very easy), so we would like to replace them with an interoperable ones, if possible.
What’s special about keepalive?
Chrome is a multi-tab applications, and many resources (CPU, memory, network, …) are associated with tab. For example, it is easy to create a non-responding web application like this.
But a user can free up the CPU resource by simply closing the tab without impacting other web pages (or tabs) negatively. That is also true for network resources. Web pages can issue many network requests, but they will be cancelled when the page is unloaded or killed forcibly.
Requests with keepalive set is an exception. As stated in the fetch spec, requests with keepalive set survive page unload. This is a kind of resource leak from the above POV; the request is still ongoing while the tab is closed, and we don’t have a means to abort it except for shutting down the entire browser.
Mitigations
Chrome has some mitigations for the problem.
Upload payload size [specced]
As written in the spec, we only allow 64KiB payload per fetch group at a time. Unfortunately, this restriction is easy to escape.
Timeout [chrome only]
Requests with keepalive set will be cancelled when 30 seconds passed since the associated context (mostly frame) is destroyed.
Number of requests per process [chrome only]
The number of requests with keepalive set per render process is restricted in Chrome. This restriction is hard to escape but we cannot have is in the spec because “renderer process” is Chrome-only concept. We also have some uncertainty because sometimes a renderer process can be shared.
Number of requests per browser [chrome only]
The number of requests with keepalive set is restricted in Chrome. This restriction is impossible to escape as long as you are using the browser.
Abort request after the response is received [chrome only]
Chrome aborts a request with keepalive when both of the following hold: The associated context has already been destroyed. The response has already been received. This prevents mass download after the tab is closed.
Interoperability
I think having interoperability here is particularly good from two reasons.
One is to have a unified policy. Given the leaky nature of the flag, each implementer has to make a balanced decision between developers’ convenience and users’ expectation. Having diverse policies will confuse both developers and users.
The other is the difficulty to handle errors. keepalive flag is expected to be used when the page is about to unload. In such a circumstance developers are not likely to be able to detect and handle errors correctly. Having a rigid, interoperable restrictions will be developers’ benefit.
Proposal
The policy we would have should have the following properties:
Here I propose the following:
Service Worker
A web developer can create multiple service workers by using iframes, so assign one KeepaliveContext for each service worker is not an option. Here are some ideas:
Have a global KeepaliveContext used by all service workers
This should work, but it will be too restrictive.
Use the first service worker client’s KeepaliveContext
This can be leaky in some pathological cases. This will be a bit unintuitive.
A service worker intercepted request keeps KeepAliveContext
(proposed by @ricea )
A requested initiated by a service worker is not protected, but a request coming from a page is protected by the KeepAliveContext for the original request.
Do nothing; Let the service worker keep itself alive
The spec says:
When detecting a abnormal operation, it makes sense to cancel the request even with keepalive set. That means, If we are interested in requests with keepalive set from the page, they should be protected by the service worker. This option disables keepalive protection in service workers.
I (@yutakahirano) like the last option.