A way to immediately unregister a service worker

KenjiBaheux commented 9 years ago

Spin off from #468. Alex:

I a conversation I had with @metromoxie makes me think we should also add a special none or blank value to Service-Worker-Allowed to disable SWs entirely. Many users want a similar kill-switch.

I'd be happy with a different header to kill things, but something seems necessary.

Anne:

I had the impression unregister() was the way to remove a single service worker and there would be no protocol equivalent.

If we want a protocol solution to removing a service worker, using HTTP 410 seems much more applicable.

Using none as keyword to forbid service workers of any kind seems fine although it's a bit unclear to me how you disambiguate that from a relative URL. Would we require URLs in that header to start with a slash?

KenjiBaheux commented 9 years ago

cc/ @slightlyoff @annevk @jungkees

joelweinberger commented 9 years ago

@annevk I think unregister only allows the you to do so at your current scope, o it seems like a 'kill switch' in the header gives the added benefit of the server being able, for any URL, to specify "no service workers at all".

Requiring absolute paths in Service-Worker-Allowed seems reasonable to me since I believe in the spec 'scope' is already required to be absolute anyway (http://www.w3.org/TR/service-workers/#unregister-algorithm for example). On the other hand, @slightlyoff's suggestion of a blank value also seems reasonable to me.

annevk commented 9 years ago

What is quoted above is out of context as I was responding to a suggestion of using none to remove the current worker, which was a suggestion I disagreed with per that comment.

jakearchibald commented 9 years ago

I believe in the spec 'scope' is already required to be absolute anyway

Nah, they can be relative (to the page).

The kill switch idea came up before and https://github.com/slightlyoff/ServiceWorker/issues/236 was created as a result. Would that work here? You could serve a SW that contained only:

self.registration.unregister({immediate: true})

…which would unregister but also un-claim all pages immediately. Alternatively, clients.release() as an explicit opposite to 'claim'.

joelweinberger commented 9 years ago

I don't think that's sufficient. I'd want to be able to just serve headers that tell the client that Service Workers are not allowed so that any content with HTTP headers, even if it's not going to run JavaScript, can specify that the origin should not have ServiceWorkers.

I'd also love to have it be a preventative measure as well, and prevent registration of ServiceWorkers, but that may be more ambitious/complicated than I think. For example, if Dropbox knows that it should never have a ServiceWorker, they should be able to effectively "turn Service Workers off" for their origin by always specifying said header with all requests.

annevk commented 9 years ago

Disabling SWs altogether and preventing installation seems like a CSP directive.

Disabling just one SW through HTTP should use HTTP status codes on its script URL.

mfalken commented 9 years ago

If the motivation of the kill-switch is a sure-fire way to immediately shutdown SW in an emergency (say you roll out a respondWith('hello world') SW), these don't seem sufficient yet:

unregister({immediate: true}) won't be run (your page is just 'hello world')
HTTP status 410 on the script URL could take 24 hours before it's fixed

But maybe 24 hours is the best guarantee the SW spec should provide? Sites that are worried can set no-cache or max-age.

mfalken commented 9 years ago

Ah, I see the SW would run self.registration.unregister({immediate: true}), so that also takes 24 hours.

I implementing both unregister({immediate: true}) and HTTP status 410 would be OK. The protocol solution is a bigger hammer in case the browser's having trouble even spinning up workers or its registration job queue is wedged.

mfalken commented 9 years ago

Could it make sense for the Update algorithm to do a HEAD request and more aggressively than the 24 hour cache check? For example every 1 hour or even every time? This would be a powerful kill-switch, no waiting for 24 hours.

kinu commented 9 years ago

@mattto Let me make sure... when you say 'every 1 hour' you mean 'when navigation happens after 1 hour+ since the last update check runs', right? I'd be worried about the battery consumption on mobile devices.

mfalken commented 9 years ago

Sorry that wasn't clear. My idea was in the Update algorithm, do a HEAD request before fetching the script. The HEAD request could be done every time, or else limited by something like once every hour (similar to how fetching the script bypasses the cache only once every 24 hours). Update can be invoked via by SoftUpdate, which the UA can run at anytime (Chrome schedules one to happen soon after every navigation to a page in scope).

In Chrome's case, I think it shouldn't be too battery/network costly since you're already hitting network by navigating to the site.

mfalken commented 9 years ago

FYI I think the current APIs let you implement the equivalent of an unregister({immediate:true}) worker:

skipWaiting();
onactivate = function(e) {
  e.waitUntil(clients.claim()
    .then(function() { return self.registration.unregister(); }
    .then(function() { return Promise.reject(); })));
};

A more sugary syntax may still be good.

mfalken commented 9 years ago

Nevermind, as per issue #659 the spec will likely change so that waitUntil(reject) is equivalent to waitUntil(resolve) in onactivate. So I think current APIs don't yet support unregister({immediate:true}) or cllients.release().

jakearchibald commented 9 years ago

I'm a little confused by the 24hr thing. A SW is checked for updates on navigation, so:

User navigates to broken page
SW serves up "hello world"
SW checks for updates, finds update, skips waiting, and tells clients to reload (https://github.com/slightlyoff/ServiceWorker/issues/681#issuecomment-103512559)

Step 2 would only be skipped if the ServiceWorker was served with a max-age and was still considered fresh.

For this use-case, I don't see a benefit to unregistering rather than fixing the SW. If an evil user has managed to install a SW on another site, the other site probably wants to unregister it pretty fast, but that's still just:

User navigates to hacked page
SW serves up "l33t hacked"
SW checks for updates, finds update, skips waiting, unregisters, and tells clients to reload (https://github.com/slightlyoff/ServiceWorker/issues/681#issuecomment-103512559)

So the new SW would be:

self.oninstall = _ => self.skipWaiting();
self.onactivate = _ => {
  self.registration.unregister()
    .then(_ => client.getAll())
    .then(clients => clients.map(c => c.navigate()));
};

jakearchibald commented 9 years ago

Moving this to v2, as we have ways of doing this without a specific API.

In cases where the SW is totally broken, this could happen:

User navigates to page
SW is broken, so it serves a network error
Browser may make a skip-service-worker request, if that works they can load the page uncontrolled

Continual failures may even result in the UA unregistering the SW.

jakearchibald commented 9 years ago

At the f2f we also had discussion about how developers could detect that their responses weren't valid. We had the idea that fetchEvent.respondWith() could return a promise that rejects if the promise passed to respondWith…

resolves to a network error
rejects
is converted to a network error by fetch (eg due to opaqueness where opaque is inappropriate)

mfalken commented 9 years ago

Hi Jake can we track the respondWith() return promise as a different issue? The idea also came up in a codereview from @mikewest: https://codereview.chromium.org/1228233007/#msg9

gauntface commented 8 years ago

Just wanted to add an extra use case.

I've writing some unit tests for SW fetch events and hit a scenario where registering and unregistering SW's between tests and essentially there was no way to completely remove the currently controlling sw since the redundant SW is brought back to life if the same SW is used between tests.

jakearchibald commented 7 years ago

I think this makes WPTs awkward too.

wanderview commented 7 years ago

Would an API performing the Clear-Site-Data header operation serve this purpose?

jakearchibald commented 7 years ago

Yeah, that spec used to have an API defined, but it doesn't anymore. It'd also clear a whole lot more, but maybe that's ok.

mikewest commented 7 years ago

I killed the 'clear()' API because I didn't know of any customers. If y'all need it, then let's talk to @annevk about sticking it into Storage.

On Fri 31. Mar 2017 at 08:05, Jake Archibald notifications@github.com wrote:

Yeah, that spec used to have an API defined, but it doesn't anymore. It'd also clear a whole lot more, but maybe that's ok.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/ServiceWorker/issues/614#issuecomment-290625091, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAF2fvd_jdaV1lhOSMbRcrx6tXgptnVks5rrJeVgaJpZM4DafNr .

-- -mike

NekR commented 6 years ago

In cases where the SW is totally broken, this could happen:

If SW is broken and it results in network error, UA should bypass cache for SW update algorithm even if updateViaCache dictates to use cache. IMO.

asakusuma commented 6 years ago

I like the idea of having an escape hatch for a misbehaving service worker via a header like Clear-Site-Data. Not requiring any any client intervention makes it a lot more foolproof.

It'd also clear a whole lot more, but maybe that's ok.

I also think this is ok. I see this as a final safety measure, and in usage scenarios, it's probably best to clear any persistent data that might have been modified by the service worker.

asakusuma commented 5 years ago

I spoke with @slightlyoff offline, he suggested I take a closer look at Clear-Site-Data.

One of the explicit goals of Clear-Site-Data is to ensure:

Service Workers registered for an origin are terminated and deregistered.

However, from what I can tell, all the spec calls for, with respect to service workers, is calling unregister on the service worker, which I don't think will actually terminate the service worker. This unregistering is caused by the storage directive.

Clear-Site-Data also supports an executionContext directive, which will reload all contexts with a document. I don't think this helps us either, since reloading does not count as actually unloading the client.

So I'm still unclear on how Clear-Site-Data achieves the "terminate" part of the service worker goal.

slightlyoff commented 5 years ago

When you reload after the clearing steps have run, there isn't a registration active. That means that no Service Worker registration will match the destination URL. This is also implied by the "neuter" language in the "executionContexts" directive.

Note that the ordering of execution is very explicit about when these steps run relative to each other.

Clear-Site-Data is designed to do (and does) exactly what you're looking for.

asakusuma commented 5 years ago

Unfortunately I cannot seem to directly test out Clear-Site-Data because of the chrome bug, and the header doesn't seem to work at all in Firefox. I did however test by manually unregistering and refreshing, which works just as you say it does.

However, I'm still not able to connect the dots on how this works according to the spec.

When you reload after the clearing steps have run, there isn't a registration active

I must be misreading the spec, as my understanding of the spec is that unregister() doesn't actually remove the registration until all clients unload, and .reload() (which is what Clear-Site-Data calls under the hood) doesn't actually cause the client to unload.

wanderview commented 5 years ago

FWIW, Clear-Site-Data should be available in firefox 63 (just shipped as stable I believe):

https://bugzilla.mozilla.org/show_bug.cgi?id=1470111

asakusuma commented 5 years ago

@wanderview thanks, I tried again with a reduced test case and couldn't get it to work on Firefox, so I opened a ticket

asakusuma commented 5 years ago

Attempting to move the F2F unregister/terminate/unsafelyDispose discussion here, from: https://github.com/w3c/ServiceWorker/issues/1303#issuecomment-432967494 https://github.com/w3c/ServiceWorker/issues/1303#issuecomment-434513827

I'd like to suggest adding options to unregister() that allow immediately stopping any new events being sent to the service worker (even for already opened clients) and then terminating all associated workers (and their associated tasks) with some time guarantee. For instance:

// Immediately stop sending events to the service worker, all
// workers will be completely gone within 10 seconds.
navigator.serviceWorker.unregister({
  unclaim: true,
  terminateDeadline: 10000
});

Another option would be to create a new method unclaim(), which is like unregister, but immediate, also with a terminateDeadline option.

One potential edge case is, what happens if you call unregister() with no options, but then later want the immediately unclaim behavior?

asakusuma commented 5 years ago

@slightlyoff re: clear-site-data, I understand the theory behind how it's supposed to work, but I'm not convinced that using unregister() (which is what clear-site-data uses) can guarantee all service workers are removed, according to the spec. unregister() doesn't actually take effect until no client is using the registration. Even if reloading() did create a point in time in which the client was not using the registration, not sure that iterating over each client and calling reload() will guarantee that there was a point in time in which no client was using the registration.

My general 2 cents is that the link between Clear-Site-Data and "all service worker gone" seems too implicit for what is designed to be a last-line-of-defense safeguard.

annevk commented 5 years ago

@asakusuma given that there are likely multiple processes involved there's not going to be a single point in time anyway and no guarantees of that kind. Clear-Site-Data including browsing context reloads should guarantee though that when they are reloaded there are no more service workers. If that's not reflected properly (and that feature isn't particularly well grounded in terms of primitives), that'd just be a bug.

jungkees commented 5 years ago

I had a chat with @asakusuma and noticed that using unregister() for CSD does not satisfy @asakusuma's need. Since the registration and its service workers are retained although getting an uninstalling flag, refreshing a document that calls into register() makes the registration and its service workers resurrected. This means the script that the developer wanted to kill is going to be used again. For a kill switch, we probably should use Clear Registration instead of unregister(). Thoughts?

(With the clear data browser settings menu, Chrome and Edge Canary seem to use unregister(), and Firefox uses Clear Registration.)

mfalken commented 5 years ago

Agreed, we shouldn't use unregister() for a kill switch or CSD. I was planning to change Chrome's impl to use Clear, see https://crbug.com/964201.

Separately, at the last F2F we decided to remove resurrection anyway: https://github.com/w3c/ServiceWorker/issues/1204#issuecomment-432946273

However just removing resurrection is not good enough, we also need to tear down the controller for existing pages that are controlled.

asakusuma commented 5 years ago

Thanks @jungkees and @mattto. In terms of normative specification language, what exactly is “Clear” (the thing you want to use rather than unregister)?

FWIW, I opened a ticket about this issue on the CSD repo: https://github.com/w3c/webappsec-clear-site-data/issues/54

mfalken commented 5 years ago

@asakusuma It's https://w3c.github.io/ServiceWorker/#clear-registration-algorithm

asakusuma commented 5 years ago

@mattto Looking at your force delete change, I see the following comment:

The rest of this function is similar to Clear() but is slightly different because this emergency deletion isn't part of the spec and happens outside of the normal job coordinator.

Why are additional safeguards beyond Clear Registration required? Does the spec need an additional or expanded algorithm for being able to guarantee terminating a service worker?

Related, should Clear Registration call Remove scope to registration map as well? unregister() is not the only entry point into Clear Registration.

mfalken commented 5 years ago

Good question. In our implementation Clear() has some logic to try to ensure the timing of events match the spec (when you see statechange, you see registration.active is null) which I felt could be possibly error-prone in ForceDelete() which is an emergency "GET ME OUT OF HERE" operation. That's where I was coming from with that comment.

I do think it'd be worth adding this operation in the spec. The spec currently doesn't allow the .controller to go from populated to null, which we do here in ForceDelete() and I believe we should also do for Clear-Site-Data and browser UI like "Clear Browsing Data".

jungkees commented 5 years ago

The spec currently doesn't allow the .controller to go from populated to null, which we do here in ForceDelete() and I believe we should also do for Clear-Site-Data and browser UI like "Clear Browsing Data".

I'm not entirely sure we should unclaim clients from Clear-Site-Data: "storage" and "Clear Browsing Data" settings. Can't the clients in the "GET ME OUT OF HERE" situation be manually reloaded? If they can be reloaded, the situation seems quite isolated within the lifetime of the page.

mfalken commented 5 years ago

Yep they can be manually reloaded but that user experience seems not great.

jakearchibald commented 5 years ago

We've wanted something like unregister({ uncontrol: true }) for a while now, so making CSD behave like that seems ok.

Also, @mattto, didn't you say Chrome already 'unclaims' if things become corrupt? It seems reasonable to spec that somehow.

mfalken commented 5 years ago

Yep, Chrome already does unclaim in that situation.

wanderview commented 5 years ago

Does chrome fire a controllerchange event when it unclaims?

mfalken commented 5 years ago

Yes

wanderview commented 5 years ago

Interesting. I'm willing to bet a lot of controllerchange handlers assume the controller is non-null when the event fires.

To be clear, I think we should fire the event, but I also expect a lot of sites won't expect it. We should probably publicize that this is possible.

mfalken commented 5 years ago

Seems right to publicize it. Note that Chrome has been doing this for a looong time; it's not a new change. However it only happens in rare circumstances so far: when the SW script failed to be read from disk.

If expanded to CSD and Clear Browsing Data, it would happen more often.

jungkees commented 5 years ago

We've wanted something like unregister({ uncontrol: true }) for a while now, so making CSD behave like that seems ok.

Deleting registration record and unclaiming clients seem two distinct operations to me. We can add unregister({uncontrol: true}) as a feature later (after v1), but the default behavior of unregister() won't do unclaim. Shouldn't we have a separate CSD directive for unclaim? I thought CSD: "executionContext" would do for it, but it seems there are issues: https://github.com/w3c/webappsec-clear-site-data/issues/59. Will folks continue to work on "executionContext"?

asakusuma commented 5 years ago

Unclaiming just means that the service worker no longer handles events for that client, correct?

If that is true, then I'm not sure we need a non-unclaiming CSD implementation. I would expect that CSD would unclaim. If we were to have multiple CSD directives for service workers, I would want delineation between stopping new events being handled by the service worker, vs going a step further and killing any in-flight events being handled by the service worker. In general, I would prefer to have the most extreme measure be the default MVP for a CSD service worker directive.

Stepping back, I see CSD as a safety mechanism for a serious problem, where collateral damage to the UX is OK. In these situations, an abrupt loss of any functionality provided by the service worker is going to preferable to a bad service worker continuing to operate unabated. Developers can always implement their own unregister() based mechanisms for less serious situations.

Also, @mattto, didn't you say Chrome already 'unclaims' if things become corrupt? It seems reasonable to spec that somehow.

@jakearchibald How are we defining "corrupt"?

jakearchibald commented 5 years ago

Stepping back, I see CSD as a safety mechanism for a serious problem, where collateral damage to the UX is OK. In these situations, an abrupt loss of any functionality provided by the service worker is going to preferable to a bad service worker continuing to operate unabated.

Strong agree. It's a kill switch.

@jakearchibald How are we defining "corrupt"?

If Chrome fails to read the registration/serviceworker from disk.

asakusuma commented 5 years ago

@jungkees @jakearchibald it just occurred to me that the current approach for CSD will also remove push notification subscriptions. While this might make sense in some scenarios, it's way overkill in others, and presents an unnecessary UX problem in making everyone re-subscribe to notifications. If there's a critical caching bug, we want to be able to use CSD without having to ask everyone to re-subscribe again.

w3c / ServiceWorker

A way to immediately unregister a service worker #614