w3c / ServiceWorker

Service Workers
https://w3c.github.io/ServiceWorker/
Other
3.63k stars 315 forks source link

skipWaiting() promise should resolve after promotion to .active #1187

Open mfalken opened 7 years ago

mfalken commented 7 years ago

As spec'd the skipWaiting() promise doesn't seem really useful. I think it might be an oversight.

I'd expect the promise to resolve after the worker has been promoted to .active and has state 'activating'.

But currently the spec is:

  1. Invoke Try Activate with service worker's containing service worker registration.
  2. Resolve promise with undefined.

And Try Activate early returns in some cases:

So if Try Activate early returned, we resolve the promise before activating.

Also am I parsing this sentence correctly: "The result of running Service Worker Has No Pending Events with registration’s active worker is true, and no service worker client is using registration or registration’s waiting worker's skip waiting flag is set." means "A && (B || C)" where A = "no pending events", B = "no client" and C = "skip waiting flag".

jakearchibald commented 7 years ago

Yeah, I agree that skipWaiting() doesn't resolve in a particularly useful way. Unfortunately it seems to be common for developers to write:

addEventListener('install', () => event.waitUntil(skipWaiting()));

…which would create a deadlock with the changes you mention.

Related: https://github.com/w3c/ServiceWorker/issues/1016

mfalken commented 7 years ago

Ah I see, I also discovered #1015 too

I guess we'll implemented as spec'd but it's pretty weird. An implementation that always resolves the promise immediately wouldn't be super wrong, it can plausibly claim that the active version just happened to be handling an event at the exact moment you called skipWaiting().

If we were to design it over, would we give skipWaiting a promise? Is there any use case for using the promise? I'm wondering if developers are using the promise expecting it to mean something it doesn't.

mfalken commented 7 years ago

Heads-up that I think the current WPT test for this is wrong. Would be great if @wanderview @jakearchibald @jungkees can confirm my analysis.

The current test skip-waiting-installed.https.html[1] sets up a waiting worker and an active worker with a client. The worker calls skipWaiting() and then checks that the promise resolves before the 'activate' event handler runs.

But tracing through the spec, skipWaiting() enters "Try Activate". "Try Activate" invokes "Activate". "Activate" blocks until the final step: "13. Run the Update Worker State algorithm passing registration’s active worker and activated as the arguments."

"Update Worker State" queues a task to set ServiceWorker#state to 'activated'. That means skipWaiting's promise resolves before that task runs. But back in step 10, we have dispatched the 'activate' event. Therefore the order should be:

  1. 'activate' event handler runs
  2. skipWaiting() promise resolves
  3. ServiceWorker#state is set to 'activated'

We're planning to change the test to assert that order.

For more context, this came up in https://chromium-review.googlesource.com/c/chromium/src/+/599570, Xiaofeng Zhang believes the spec changed in https://github.com/w3c/ServiceWorker/pull/1065 so that the test is now wrong.

[1]

mfalken commented 7 years ago

FYI: we're revising the test at https://chromium-review.googlesource.com/c/chromium/src/+/646244

wanderview commented 7 years ago

I think a series of spec changes have changed skipWaiting() indirectly. I agree the current spec says that skipWaiting() should resolve before the worker state is updated, but is that the intent of this promise? Should we instead change the skipWaiting() spec to effectively listen for a statechange event and resolve then?

@jakearchibald @jungkees what do you think?

mfalken commented 7 years ago

In any case, we'll update the test to match the current spec, and whenever the spec changes we can update the test again.

jungkees commented 7 years ago

Should we instead change the skipWaiting() spec to effectively listen for a statechange event and resolve then?

As commented in https://github.com/w3c/ServiceWorker/issues/1015#issuecomment-263508737, that change may create a deadlock to install handlers that waitiUntil the given skipWaiting promise. It seems we should keep the current behavior to not break the existing code. It should have been better if it returned void.

mfalken commented 6 years ago

I think we can just add "Wait for all the tasks queued by Update Worker State invoked in this algorithm to have executed." to after Step 15 of Activate. There's similar verbiage elsewhere and I think it matches the intent of the algorithm as wanderview asks in https://github.com/w3c/ServiceWorker/issues/1187#issuecomment-326582954. So this way, in the common case, the order will be:

  1. 'activate' event handler runs
  2. ServiceWorker#state is set to 'activated'
  3. skipWaiting() promise resolves

(so 2 and 3 are flipped from https://github.com/w3c/ServiceWorker/issues/1187#issuecomment-326488845)

jakearchibald commented 6 years ago

@mattto I can't figure out how that fixes the deadlock in https://github.com/w3c/ServiceWorker/issues/1187#issuecomment-323006020. Step 1 would be blocked by step 3.

jakearchibald commented 6 years ago

If we think https://github.com/w3c/ServiceWorker/issues/1187#issuecomment-323006020 is no longer a problem (do we have data?), then it feels like skipWaiting() should resolve once the service worker moves from .waiting to .active. At this point it's in the "activating" state, I don't think it needs to wait for "activated".

jungkees commented 6 years ago

Resolving it once the service worker moves from .waiting to .active would be good for the skipWaiting()'s semantic if we could. But I still don't see any good way to avoid the deadlock in https://github.com/w3c/ServiceWorker/issues/1187#issuecomment-323006020. The service worker's install handlers have to wait until the waitUntil(skipWaiting()) promises resolve to become a waiting worker.

mfalken commented 6 years ago

This change shouldn't regress the current spec: if it didn't deadlock before, it wouldn't deadlock now.

If one does waitUntil(skipWaiting()) in the install event handler, then skipWaiting() resolves immediately because Try Activate bails since waiting worker is null.

This change only affects the case where skipWaiting() actually triggers activation. To trigger this behavior an installed worker needs to call skipWaiting(), i.e., in an onmessage event.

I'm motivated to fix this because we would need to add some complexity to Chrome to guarantee the order per the current spec.

I suspect sites would still deadlock if the spec didn't guard against it, e.g., with the Try Activate escape hatch. Another alternative is to just have skipWaiting() always resolve immediately. I am actually feeling that's more consistent.

jungkees commented 6 years ago

@mattto, I left the above comment before I saw your last comment.

If one does waitUntil(skipWaiting()) in the install event handler, then skipWaiting() resolves immediately because Try Activate bails since waiting worker is null.

You're right. I missed this point indeed. I'll review your PR. @jakearchibald, @wanderview, given @mattto's analysis, calling skipWaiting() in the install handlers wouldn't be an issue, right?

jungkees commented 6 years ago

@mattto, I just looked into https://github.com/w3c/ServiceWorker/pull/1327. Wouldn't it be nice if we could do https://github.com/w3c/ServiceWorker/issues/1187#issuecomment-398394573?

mfalken commented 6 years ago

I dunno, I'm wondering now why we don't just resolve immediately always. It would be simple and no deadlock.

jungkees commented 6 years ago

As we made it a promise (not returning void), just guaranteeing the state transition - from installed to activating seems like a better behavior if we could do (under the premise that it won't deadlock).

mfalken commented 6 years ago

As noted in the original post, we resolve immediately in other cases too. AFAICT there is a rather limited case where we try to resolve it to something meaningful. It only happens when the service worker is already installed AND the active worker has no events AND something asks the installed worker to call skipWaiting(). Changing the meaning of the promise that much seems to add complexity for little gain.

jungkees commented 6 years ago

Sorry about the confusion, but please ignore my comments from https://github.com/w3c/ServiceWorker/issues/1187#issuecomment-398569800.

If one does waitUntil(skipWaiting()) in the install event handler, then skipWaiting() resolves immediately because Try Activate bails since waiting worker is null.

There's a case when registrations can hold an waiting worker while installing a new one.

jungkees commented 6 years ago

We already have a deadlock scenario:

  1. A registration is created and has its active worker.
  2. Clients navigate to the scope and start being controlled.
  3. A new service worker version with no skipWaiting() call is installed (and not promoted due to the open controlled clients). (The user does not close the tabs for days and come back.)
  4. A new service worker version with waitUntil(skipWaiting()) call in the install handler is installing.

Should we make it "just resolve immediately always" as @mattto suggested?