consider allowing a non-scope identifier for registrations

wanderview commented 4 years ago

Currently registrations are identified and keyed based on their scope string. This works well for most cases, but can create difficulties for folks that want to modify the structure of their site.

For example, consider a site that has a product hosted at a location like:

foo.com/product/bar

But they want to re-organize their site to remove the "/product" sub-path since its needlessly verbose. They want to move to:

foo.com/bar

In this case they must unregister their old scope and register their new scope. Both the removal of the old registration and the installation of the new registration are async operations that can take time to complete. In addition, both of these service workers may wish to manage the same underlying resources. Trying to coordinate between the service workers to avoid accidentally breaking something can be difficult.

As an alternative, we could allow the service worker to specify a separate origin-unique identifier. For example:

await navigation.serviceWorker.register('sw.js', {
  scope: '/product/bar',
  id: 'product-bar',
});

Then when the site registers a service worker on a different scope with the same id, it would be treated as an update of the service worker that modifies the scope.

While there are uses for this today, this sort of thing may be a lot more useful in a world where the scopes pattern matching has been implemented. With the scopes pattern matching sites may want to mutate their scope pattern over time to cover more sites or to fix mistakes in the pattern, etc.

Note, there is also an effort to add a unique identifier to manifests for webapps in w3c/manifest#586. I asked if they wanted to try to unify with service workers, but the feedback was that they should be separate.

The manifest issue also suggests the identifier should be a URL instead of an opaque string. I don't have a strong feeling about, although I feel it might confuse people into thinking it somehow operates as a scope.

Edit:

Final proposal is summarized in https://github.com/w3c/ServiceWorker/pull/1526.

Security/Privacy Considerations:

This proposal does not change the security or privacy characteristics of service workers:

The id value does not expose any additional information about the device or user to the site.
The id value has no cross-origin communication or interaction at all.
All existing restrictions like https://w3c.github.io/ServiceWorker/#path-restriction are maintained.
Modifying a registration's scope will fail if it conflicts with a scope currently owned by another registration. This prevents accidental scope confusion. (But this is more of footgun issue and not a security issue.)

asutherland commented 4 years ago

This seems reasonable. Presumably this would be spec'ed so that this couldn't be used as an end-run around the https://w3c.github.io/ServiceWorker/#path-restriction mechanism by stealing a registration with actively controlled clients that could not otherwise be controlled by a given script?

wanderview commented 4 years ago

I'm not sure I understand, can you expand on the risk you are thinking of?

Perhaps the question is around what happens in the scenario:

Old version is active and controlling clients.
New version is installing and the new scope would not match the controlled clients from (1).
New version calls skipWaiting().

What happens? It seems skipWaiting() should either (a) cause the clients that no longer match the scope to stop being controlled or (b) fail to promote new version to active if there are non-matching controlled clients in play.

asutherland commented 4 years ago

Yes, that. Thanks!

jakearchibald commented 4 years ago

I guess it would reject if a service worker changed scope to a scope that already existed on the origin?

await navigation.serviceWorker.register('/sw.js', {
  scope: '/product/bar',
  id: 'product-bar',
});

await navigation.serviceWorker.register('/sw2.js', {
  scope: '/bar',
});

await navigation.serviceWorker.register('/sw.js', {
  scope: '/bar',
  id: 'product-bar',
});

jakearchibald commented 4 years ago

As a compatibility path, what if id defaults to the scope?

// Past me did this:
await navigation.serviceWorker.register('/sw.js', {
  scope: '/product/bar',
});

// But it's ok, I can move it:
await navigation.serviceWorker.register('/sw.js', {
  scope: '/bar',
  id: '/product/bar',
});

wanderview commented 4 years ago

I guess it would reject if a service worker changed scope to a scope that already existed on the origin?

It could, but I think it would be consistent with current behavior to unregister the sw2.js registration in your example.

Tangentially related to this is something I've been thinking about for the scopes pattern matching effort. If we add multiple scope strings for a registration (as recommended from the TPAC meeting), then the overlapping scope scenario here could have just a partial overlap. I think we would probably want to treat them the same as what we do here, though; error or replace.

As a compatibility path, what if id defaults to the scope?

This makes sense to me, but I haven't thought it fully through.

wanderview commented 4 years ago

I would like to move forward on this with the following:

id value permits any kind of string. It does not have to be a URL like scope.
id defaults to scope value.
Spec will be updated to operate on id value everywhere we currently use scope for the identifier.
Moving a registration to a scope that exactly matches another existing registration will replace that registration. From an API perspective this is similar to calling register() again on the same scope with a different script URL, etc. I think throwing would be unexpected.

@jakearchibald @youennf @asutherland @jungkees please let me know if you have any comments or concerns with this. Thanks!

asutherland commented 4 years ago

@wanderview Elaborating on my original concern in https://github.com/w3c/ServiceWorker/issues/1512#issuecomment-610684558 I'm not sure from your description in https://github.com/w3c/ServiceWorker/issues/1512#issuecomment-666454901 how this prevents an end-run around https://w3c.github.io/ServiceWorker/#path-restriction by stealing existing controlled clients. Can you expand on how your prior comments interact with your current proposal given the following scenario:

A SW was installed on a site with scope "/send-money/" and defaulted id is "/send-money/". It hosts a UI for sending money that's protected by some type of 2nd factor code.
A partial site compromise allows an attacker control over static content hosted at "/about/" including the ability to host a ServiceWorker script there, but no higher.
A user goes to /send-money/ to send money. They click on the "/about/" link to open in a new tab or a pop-up because they want to check something out or there's some ill conceived iframe stuff going on.
The "/about/" page does register('/about/evil-mitm-sw.js', {id: '/send-money/', scope: '/about/' }) and "evil-mitm-sw.js" does skipWaiting() and is thereby able to take control of the existing controlled clients and intercept their network requests. It re-writes the target bank account numbers in the intercepted fetch requests for the transfer or something, which is a plausible nefarious thing that CSP can't stop, why not.

Note that I'm not suggesting that https://w3c.github.io/ServiceWorker/#path-restriction is going to moot the massive badness of such a hypothetical compromise given that same-origin is where the actual security boundary is, but https://w3c.github.io/ServiceWorker/#path-restriction is an existing protection that is stricter than same-origin and we should be intentional about weakening or removing it.

wanderview commented 4 years ago

Sorry, I did not address it in my previous comment. I propose:

Changing the scope of a registration would still need to pass the path-restriction/ServiceWorkerAllowed check just like if registered from scratch. (I assume we do this when changing the script URL on a registration today?)
By default the new service worker will not activate if there are existing controlled clients.
skipWaiting() will cause the new service worker to activate and any existing controlled clients not covered by the new scope become uncontrolled via a controlchange event.
The registration.scope attribute reflects the scope associated with the oldest worker, so the scope attribute does not change until the new service worker activates.

The main ugliness here is that the scope is now associated with a version of the service worker and not the registration as a whole. Its a worse form of the updateViaCache change on the registration previously.

Should I expose a ServiceWorker.scope to reflect the scope of each version? I'm not sure it would be used in the algorithm, but it might help folks understand that it can change. Not sure on that one.

wanderview commented 4 years ago

Actually, I kind of like being explicit about the ServiceWorker having a scope and I think I would make some of the algorithms use that.

asutherland commented 4 years ago

That sounds reasonable to me, thank you. I recall the path-restriction check to be self-consistent and having the scope be per-ServiceWorker should make things work nicely. Pedantnit: I am assuming s/controlchange/controllerchange/ was intended for the event.

jakearchibald commented 4 years ago

Moving scope to a service worker level thing sounds fine, and I agree it should be exposed to script.

skipWaiting() will cause the new service worker to activate and any existing controlled clients not covered by the new scope become uncontrolled via a controlchange event.

I agree this as the only sensible thing to do. However, I don't think skipWaiting should cause clients to switch to another service worker registration, or switch from no registration to a registration. We have clients.claim for that.

Moving a registration to a scope that exactly matches another existing registration will replace that registration. From an API perspective this is similar to calling register() again on the same scope with a different script URL, etc. I think throwing would be unexpected.

I don't think this is the right model. Calling register with a different script URL is altering one registration, whereas the scenario above is moving one registration and unregistering another.

const reg1 = await navigator.serviceWorker.register('/sw1.js');
const reg2 = await navigator.serviceWorker.register('/sw2.js', { id: 'foo' });

I'm assuming that:

reg1 has an id of / and a scope of /
reg2 has an id of foo and a scope of /

Either:

The second register fails because there's already a registration at that scope.
The first registration implicitly unregisters.

In the second case, reg1 would continue to control clients until they go away, or reg2 claims those clients.

My only worry about this behaviour is it involves data loss. Things like push registrations, bg fetch, associated with reg1 would be removed. Could that be a gotcha?

wanderview commented 4 years ago

I don't think this is the right model. Calling register with a different script URL is altering one registration, whereas the scenario above is moving one registration and unregistering another.

I can make it reject for now and we can consider loosening it in the future.

I do worry a bit about some sharp edges for sites in the future with my pattern matching scopes work. We want to support registrations with multiple scopes. If a site is migrating from having two registrations to a single registration that covers both of the old scopes how do they reasonably do that migration? It may be hard to ensure the timing of the registration/move of the final registration vs the removal of the old registration. They certainly can't do it atomically.

Maybe we could consider adding a further unregisterConflicts:true option to register() that would opt-in to removing conflicting registrations automatically.

wanderview commented 4 years ago

Work-in-progress draft PR here: https://github.com/w3c/ServiceWorker/pull/1526

jakearchibald commented 4 years ago

Maybe we could consider adding a further unregisterConflicts:true option to register() that would opt-in to removing conflicting registrations automatically.

If you're going from two registrations to one, 'merge' might be a better option than 'overwrite'.

await navigator.serviceWorker.register('/sw2.js', {
  id: 'foo',
  conflictResolution: 'merge',
});

Where the other options are 'unregister' and the default, 'error'.

wanderview commented 4 years ago

Can you say more about how you would expect "merge" to work? Which service worker script is chosen? Similarly, when other attributes are different, which is selected?

jakearchibald commented 4 years ago

Hmm yeah it's not as easy as it seemed to me this morning. The idea would be to unregister the 'overwritten' registration, but move its inner registrations (like push, bg-fetch) to the new registration. But the timing of that is not at all clear.

wanderview commented 4 years ago

I also think we probably want to add a getRegistrationById() method.

wanderview commented 4 years ago

While thinking about this today I realized that I need to map the global "id to registration" map really have {origin,id} tuples as the key. Different origins can use the same id values and they should not conflict.

The alternative where we restrict id values to URLs like scope could instead just use the id as the key since the origin would be directly embedded in it. This would be a bit simpler to implement.

I think I still prefer the ergonomics of the non-URL id, though. I personally find it confusing when URLs are used as identifiers, but not actually used for anything functional like fetching resources, etc.

Just throwing this out there in case it changes anyone's mind.

jakearchibald commented 4 years ago

I agree it should be a non-URL id. Implementations can try and make these URLs internally if it's easier (and they can do so without creating security issues).

wanderview commented 4 years ago

And just to confirm, we are ok deviating from what manifest is doing? I believe they want to use a URL for the id in w3c/manifest#586.

I think deviating should be ok since the manifest id value will still be usable as a service worker id value. And there is nothing requiring them to be identical for a PWA beyond code hygiene as far as I know.

wanderview commented 4 years ago

I think I am going to try to spec an option to replace registrations with conflicting scopes. Thinking about my conversations with partners I think its very high probability to be used/needed.

I'm thinking an enum, though, instead of a boolean. So something like:

navigator.serviceWorker.register('sw.js', {
  scope: '/foo',
  scopeConflict: 'unregister',
});

This would auto-unregister any conflicting service workers before installing the new service worker. If the installation fails, then you would end up with neither the new service worker nor the old conflicting registration. Doing an atomic replace after install is much more complex and I'm not sure its worth it. I think "unregister" will communicate this behavior a bit better than "replace". If we find we need an atomic replace in the future we can add another enum value for it.

Thoughts?

wanderview commented 4 years ago

Hmm, but if its not an atomic replace, then the only value in doing it natively over in user js would be doing an immediate purge of the conflicting registration. But the purge PR in #1506 is not landed yet. And if it did land, we could just expose an unregister({ immediate: true }) which would let js implement their own logic.

So I guess ignore the last comment. I will not pursue the scopeConflict option for now.

wanderview commented 4 years ago

I believe the PR is now ready for review if anyone is up for it.

I'd like to do WPT tests as part of chromium implementation if that is ok.

jakearchibald commented 4 years ago

I'll aim to review this week

youennf commented 3 years ago

I think I like the idea of relaxing the link between scopes to service workers. I am less sure about exposing IDs to web pages though and I am not sure integrating it with register is the best approach. For instance, what if a web developer wants to replace two existing service workers by three new service workers.

AIUI, the issue is about the ability to dynamically change service worker scopes. Why not using a lock/unlock approach or exposing an API that would update the scope map atomically? Something like navigator.serviceWorker.updateScopes({sw1: myScope1, sw2: myScope2, ...})

Let's say we introduce the possibility for a service worker to have a 'null' scope (aka a service worker is registered but has no entry in the scope to registration map). Scripts could register several new service workers with 'null' scopes, update scopes and unregister old service workers with 'null' scope.

wanderview commented 3 years ago

For instance, what if a web developer wants to replace two existing service workers by three new service workers.

What is the use case for this? I am having a hard time coming up with one based on the partners I have talked to.

A more realistic scenario is collapsing multiple old registrations into a single list scope registration. But I think we could tackle that by allowing the new registration to simply overwrite old registrations with a "force:true" option or something. That would be much less complex than a general locking mechanism.

If we want to allow registrations to change scope, it seems necessary to me for the scope not to be used as the unique identifier for the registration. I don't see how trying to have mutable unique identifiers for registrations makes things any easier. If anything it seems to make things dramatically more complex.

Something like navigator.serviceWorker.updateScopes({sw1: myScope1, sw2: myScope2, ...})

Can you please elaborate on this? I see you've identified two scopes, but I don't understand what this is supposed to do to the service worker registrations.

Let's say we introduce the possibility for a service worker to have a 'null' scope

This sort of thing was proposed long ago by mozilla with new ServiceWorker(), etc, but I think the ship has long sailed on that approach. In any case, I don't see how it helps with the use cases here and it seems orthogonal to the issue at hand.

youennf commented 3 years ago

What is the use case for this?

I do not have a particular use case. It is just an extrapolation of the use cases you were mentioning. If we introduce scope lists, this could add further possibilities to migration scenarios.

But I think we could tackle that by allowing the new registration to simply overwrite old registrations with a "force:true" option or something. That would be much less complex than a general locking mechanism.

Agreed lock/unlock might be too complex. I fear force:true (or other booleans proposed in this issue) might be too restrictive. I would be happy to be convinced otherwise.

If we want to allow registrations to change scope, it seems necessary to me for the scope not to be used as the unique identifier for the registration.

As I said earlier, I agree with the idea to relax the relationship between scopes and registrations. A registration ID for internal management purposes might be fine but I am not sure why we should expose such IDs to web pages. We already have service worker and registration objects for instance.

Something like navigator.serviceWorker.updateScopes({sw1: myScope1, sw2: myScope2, ...})

Can you please elaborate on this? I see you've identified two scopes, but I don't understand what this is supposed to do to the service worker registrations.

Let's say we have two old service workers sw1 and sw2 and want to replace them with a brand new sw3. const sw1 = await navigator.serviceWorker.register(...); // wait for sw1 to be activated. // Atomically change scopes so that sw1 and sw2 will no longer match while sw3 will match both sw1_scope and sw2_scope. await navigator.serviceWorker.updateScopes({ sw1: null, sw2: null, sw3: ['sw1_scope', 's2_scope']})

A weakness of this proposal is the fact that sw3 would be registered to an initial scope, hence why I was mentioning the idea of an empty scope.

wanderview commented 3 years ago

A registration ID for internal management purposes might be fine but I am not sure why we should expose such IDs to web pages.

Without exposing the id there is no way for the page to reliably lookup and refer to registrations.

I really don't understand the concern with exposing the id either. What negative consequence are you concerned about?

youennf commented 3 years ago

Without exposing the id there is no way for the page to reliably lookup and refer to registrations.

To refer to registrations, we currently have ServiceWorkerRegistration. To lookup registrations, we have getRegistrations() and getRegistration(clientURL). Can you clarify how that is insufficient and/or what the added value of IDs is?

wanderview commented 3 years ago

If the scope is mutable, than getRegistration(clientURL) is non-deterministic as to what will be returned. Has the scope already changed? Similarly, getRegistrations() returns all registrations, but you now have to divine which one is the one you want when there is no fixed identifier you can rely on being unchanged.

Again I ask, what specific risk or harm are you concerned about with exposing registration.id?

Trying to think of objections I can come up with these assurances:

The id is specified by the web developer, so its not any kind of browser generated finger printing vector.
Exposing an id does not prevent adding a more complex API like your updateScopes(). It would just reference id values to identify the registrations.

As to whether register() is the right place to make a change I feel pretty strongly that it is. All other multiple properties directly on ServiceWorkerRegistration and ServiceWorker are modified through register() as the mutating function. It works for both new registrations and modify existing registrations. (The only exception to this are things like NavigationPreloadManager which are on separate manager objects, but scope is a property and not on a manager object.)

Finally, the partners I have talked to don't seem concerned about the multiple registration case. Perhaps we will get future feedback that its a use case that needs to be solved, but it can be done incrementally on top of this proposal. Nothing here precludes dealing with that use case later. We can add additional options to register() or a separate method as you propose.

Right now, though, I don't have any developers telling me that multiple registration scope mutations is an important problem. And it will be very complex and costly to make work atomically due to life cycle issues; i.e. you need to remove conflicting registrations before you know if the new installation works. I strongly feel we should not commit to this use case without developers providing data that its worth paying the cost.

@jakearchibald and @asutherland as other folks who have chimed here do you have any opinions?

youennf commented 3 years ago

The id is specified by the web developer

This is part of my concern or my misunderstanding of how this is expected to work. Why should developers set IDs in the first place? How should they choose these values? IDs seem mysterious compared to scope which has a very clear meaning.

If there is a one-to-one mapping between IDs and ServiceWorkerRegistration objects, why not passing ServiceWorkerRegistration directly to the API? Would there be a difference in behavior?

it can be done incrementally on top of this proposal.

Are we confident in that? How is IDs exposure expected to help there?

A service worker might support several scopes in the future, we might end up wanting to migrate some but maybe not all of the scopes to a different service worker. Is exposing ID helping this case?

asutherland commented 3 years ago

I think https://github.com/w3c/ServiceWorker/issues/1085 made the case for why id's are potentially quite useful. If you have a complex site using a complex ServiceWorker, having registrations named using a site-provided id lets one create "v1" and "v2" registrations and then identify what version is currently controlling a scope, determine whether it's installed, pivot it into place, etc. The current ServiceWorker model for registrations and updates doesn't correspond well with the reality of how software/websites are developed. This enhancement helps address that and having the id makes that much more straightforward.

From a testing perspective I think the id is also invaluable. This can be reverse engineered by ensuring you have a global that lives for the entire duration of a test that leverages the object identity of the ServiceWorkerRegistrations to assign persistent names to them, but that's a lot of work in a world where it's already difficult to write, let alone understand, ServiceWorker tests.

youennf commented 3 years ago

having registrations named using a site-provided id lets one create "v1" and "v2" registrations

A website can get the registration for a given scope, post message to the corresponding service worker which will post message back its version. For instance, the version value could be baked into JS script, which might be handy to handle DOM cache versioning. Isn't this good enough or are we trying to make it more convenient to web pages? How is it done with frameworks like workbox? I am sure there are other ways to attach arbitrary metadata to a service worker registration.

I am not sure how a web site can be sure that an ID it gave to register will exactly refer to the registration object the page is thinking of in multi-process architectures? Say we get registration.id and plan to use it, but take enough time that another page uses to unregister/register registrations with the same ID.

Do IDs need to be given by web pages? Can/should they be generated by the user agent?

asutherland commented 3 years ago

A website can get the registration for a given scope, post message to the corresponding service worker which will post message back its version. For instance, the version value could be baked into JS script, which might be handy to handle DOM cache versioning.

postMessage-ing a ServiceWorker which potentially could bring both processes and threads into existence and involve evaluating megabytes of JS to avoid exposing a string identifier seems like a bad battery trade-off? Especially if such an idiom might result in a page postMesssaging potentially N registrations for its whole origin?

I am not sure how a web site can be sure that an ID it gave to register will exactly refer to the registration object the page is thinking of in multi-process architectures? Say we get registration.id and plan to use it, but take enough time that another page uses to unregister/register registrations with the same ID.

My understanding of the proposal was that basically everything that is currently "scope to FOO map" becomes "id to FOO map". So the existing scope to job queue map and its ordering guarantees and coalescing and idempotency semantics will be maintained. When coordination that exceeds this weak ordering mechanism is required, there's https://wicg.github.io/web-locks/ (which Firefox plans to implement).

Do IDs need to be given by web pages? Can/should they be generated by the user agent?

I believe the prior suggestion was to default to using the scope as the id if not explicitly specified, which seems reasonable and equivalent to current behavior.

wanderview commented 3 years ago

Why should developers set IDs in the first place? How should they choose these values?

In addition to Andrew's comments I would also note that the web platform is rife with precedent for developer-set identifiers; e.g. Cache API cache names, IDB database names, <element id="foo">, css class names, etc. I think its fair to say developers value giving things meaningful names so they can reference them again later.

youennf commented 3 years ago

I would feel more comfortable if we could validate this will work out well with the more complex cases that will arise in the future (scope sets, partial migrations...). In particular, as I asked earlier, do we have a rough understanding on how

it can be done incrementally on top of this proposal. Or is it that web locks is supposed to handle these cases, outside of this proposal?

For instance, what is the implication of scope sets to that proposal? Would we make it mandatory to provide an id if passing a scope set to register? If so, it seems that we are trying to define a scope name, like we have cache names.

postMessage-ing a ServiceWorker which potentially could bring both processes and threads into existence and involve evaluating megabytes of JS to avoid exposing a string identifier seems like a bad battery trade-off?

This could be optimised, say service worker at install time adds its scope and version in some database. The point is whether service workers are doing this sort of management, and if so, how they are doing it, for what purposes... Then we can validate whether ID is the right tool.

Cache API cache names, IDB database names, <element id="foo">, css class names,

In terms of terminology, it seems that name is a tad better than ID here. There is a uniqueness to the term ID (like in ServiceWorkerClient.id) which does not really seem guaranteed here. It is also somehow more consistent with Caches and IDB.

wanderview commented 3 years ago

What is a "scope set"? Do you mean the list from the scope pattern effort? If so, my current plan is the id value would get the serialized string form of the scope pattern, whether its a list or not. Since that work is not finalized I don't have a finalized serialization. I don't see why that should stop the addition of an id, though.

We could also make an id mandatory for new scope pattern or pattern list uses. There is no backward compat issues to deal with there since its a new feature. I don't have a strong preference on that vs stringifying the scope pattern.

In terms of terminology, it seems that name is a tad better than ID here. There is a uniqueness to the term ID (like in ServiceWorkerClient.id) which does not really seem guaranteed here.

I don't care if we call this a name vs an id. I will note, though, it is required to be unique. You cannot have two registrations with the same "name" at the same time.

I would feel more comfortable if we could validate this will work out well with the more complex cases that will arise in the future (scope sets, partial migrations...).

I still don't fully understand what use cases you are exactly trying to cover. The more clearly you define those the better I can address them.

Speaking to the possible "many-to-one" registration convergence case we discussed above I've already briefly mentioned on incremental approach:

navigator.serviceWorker.register(script, {
  id: 'foo',
  scopePattern: patternList,
  replaceConflicts: true // replace any conflicting registrations
});

Or we could make it more explicit:

navigator.serviceWorker.register(script, {
  id: 'foo',
  scopePattern: patternList,
  unregister: { sw_id_1, sw_id_2, sw_id_3 }  // remove other registrations we are replacing
});

Or we could create a separate function like you proposed above, but it would take id's instead of scopes:

navigator.serviceWorker.updateScopes({sw1: sw_id_1, sw2: sw_id_2, ...})

Again, I don't think we have justification to pay the complexity cost for any of these solutions yet. However, I really don't see any limitations imposed by having a name/id to reference registrations.

asutherland commented 3 years ago

postMessage-ing a ServiceWorker which potentially could bring both processes and threads into existence and involve evaluating megabytes of JS to avoid exposing a string identifier seems like a bad battery trade-off?

This could be optimised, say service worker at install time adds its scope and version in some database.

xref https://github.com/w3c/ServiceWorker/issues/1331 and in particular Asa's comment https://github.com/w3c/ServiceWorker/issues/1331#issuecomment-403575004 which cited latencies of up to 1 second with IndexedDB. Also xref https://github.com/w3c/ServiceWorker/issues/1157

youennf commented 3 years ago

xref #1331 and in particular Asa's comment #1331 (comment) which cited latencies of up to 1 second with IndexedDB. Also xref #1157

This comment was done 2 years ago. Since then, I know some implementation efforts have been made to speed up IDB. Are we sure this still holds true? Is DOM Cache suffering from the same issue?

my current plan is the id value would get the serialized string form of the scope pattern, whether its a list or not

This seems like a good plan, good enough that web developers might not feel the need to set IDs very often. They might actually never want to do it in the context of the migration problem we want to solve.

For instance, if we take your example, we could pass scopes instead of IDs: navigator.serviceWorker.register(script, { scopesToRemove: ['foo/product',...], scope: 'foo', }); What is the advantage of passing an ID instead of a scope?

Another possibility is to pass registration objects directly: navigator.serviceWorker.register(script, { registrationsToRemove: [registration1, ...], scope: 'foo', }); The difference between passing strings versus registrations is that the user agent might be able to check that the registration is still in the map when manipulating it. This is not possible with a string. If the registration is not in the map, maybe it is safer to reject? I am not sure that this additional safety is worth the extra work to get the registration before removing it. But it seems worth deciding explicitly what we prefer.

One problem I have with the ID proposal is that ID is optional, because we already have scope and scope is mandatory. For other APIs I know of, either web developer needs to pass IDs or IDs are generated by user agents. Do we have an existing API that follows this mixed pattern?

wanderview commented 3 years ago

Do we have an existing API that follows this mixed pattern?

<element id="foo"> is an optional identifier.

This seems like a good plan, good enough that web developers might not feel the need to set IDs very often.

I would argue it would be a best practice to set an id if you are using a pattern or pattern list. This makes me more inclined to go with your previous suggestion of requiring the id in those cases.

What is the advantage of passing an ID instead of a scope?

I think the id is much more developer friendly. Typing a serialized scope list may not be very easy or readable. Also, an API which requires you to type the list as it was and the list as it will is going to be very verbose. Allowing the developer to provide a meaningful product name like "my-product-foo" is going to be a lot more ergonomic.

Also, the id makes it clear we are modifying a single registration and not creating a new registration to replace an old one. This is important when you consider other state that sits in managers (push, nav preload, etc) attached to the registration. We don't want to lose all that when the scope changes.

Finally, I think its just bad design to allow mutation of the scope on a registration and also try to keep using the scope as a key. How is the spec supposed to sanely deal with a registration where the active worker is scope A and the installing worker is for scope B? We need a stable, immutable key.

youennf commented 3 years ago

Do we have an existing API that follows this mixed pattern?

<element id="foo"> is an optional identifier.

If web app does not provide the identifier, the element is not in the map. The user agent will not generate an identifier to put the element in the map.

Allowing the developer to provide a meaningful product name like "my-product-foo" is going to be a lot more ergonomic.

Let's get back to your first example. A first service worker was registered with: sw1 = await navigation.serviceWorker.register('sw.js', { scope: '/product/bar' }); Then the web app wants to move the scope to 'product'. The idea would be to do the following: sw2 = await navigation.serviceWorker.register('sw.js', { scope: '/product', id: '/product/bar' }); In terms of ergonomic, this does not look great: sw2 ID would be '/product/bar' which is potentially no longer meaningful.

The same argument applies if we try to sneak version info in the IDs. If a web app uses 'product-v1' and wants to go to 'product-v2', how can the web app do without removing the 'product-v1' registration?

Another point related to the versioning use case. As part of the storage spec, there are ideas around grouping resources in a bag (IDB, cache, multiple service worker registrations potentially). It seems that web app provided names for those bags might be a better fit than immutable IDs tied to a registration.

Also, from a readability point of view, using 'id' in the register method does not make it clear at all that sw2 is 'replacing' sw1. A more explicit syntax might actually be more ergonomic.

Also, the id makes it clear we are modifying a single registration and not creating a new registration to replace an old one. This is important when you consider other state that sits in managers (push, nav preload, etc) attached to the registration. We don't want to lose all that when the scope changes.

Agreed we do not want automatic dropping of managers. It seems this can be resolved with or without IDs though. Or am I too optimistic? If registering sw2 does remove sw1, we could spec that all managers attached to sw1 are moved to sw2. This might require service worker registrations to have a 'manager list' slot.

Another aspect that does not look great with the ID proposal is that it makes it easy to 'replace' one existing registration but not two existing registrations. If we think a web site might want to replace '/product/bar' by 'product', it seems very natural that a website might want to replace two registrations ('product/bar' and 'product/foo') by a single 'product' registration.

If we introduce a way for one service worker registration to lock another job queue, this paves the way for multiple registration removal support, should we feel the need. This locking does not look like something impossible to specify and implement. I might be too optimistic again, let me know. For web developers, I do not see the complexity that this proposal will bring.

Finally, I think its just bad design to allow mutation of the scope on a registration and also try to keep using the scope as a key. How is the spec supposed to sanely deal with a registration where the active worker is scope A and the installing worker is for scope B? We need a stable, immutable key.

It is not yet clear to me whether we want to mutate scopes of a registration or whether we want to easily replace some registrations by a new registration. In terms of spec editing, is the following feasible?

Update the spec to move the job queue to the service worker registration
Allow the scope to registration map to have multiple keys pointing to the same registration

I am not against IDs per se. So far, I haven't seen a feature that can be implemented by introducing IDs but not without it. The main advantage might be web developers ergonomics but I would like to be sure that IDs will deliver that in practice.

wanderview commented 3 years ago

Let's get back to your first example. A first service worker was registered with: sw1 = await navigation.serviceWorker.register('sw.js', { scope: '/product/bar' }); Then the web app wants to move the scope to 'product'. The idea would be to do the following: sw2 = await navigation.serviceWorker.register('sw.js', { scope: '/product', id: '/product/bar' }); In terms of ergonomic, this does not look great: sw2 ID would be '/product/bar' which is potentially no longer meaningful.

Moving from a sub-product up to a top level product is not something I've heard any developers tell me they want to do. But even, so the fact that they could do this if the need arose is a feature. The fact that the name becomes sub-optimal should not prevent us from supporting the other more common use cases. In the more general case, I don't think the browser should say developers can't name things because they might choose bad names.

Update the spec to move the job queue to the service worker registration

Taking stateful things that used to have an immutable relation with a registration and now allowing them to move seems to add a lot of complexity. In particular, you would have to move the job queue while you are in the middle of a job running from that queue. This does not seem like a good design option to me.

Allow the scope to registration map to have multiple keys pointing to the same registration

I'm not sure how this helps to be honest. I guess you are saying allow "old scope" and "new scope" to point to the same registration for some time? I think the complexity comes from when "old scope" stops working and how to handle that transition without edge cases. That problem seems to persist regardless of how long you delay dropping the "old scope".

Maybe it would be possible to make these work with a lot of effort, but the cost benefit does not seem to make sense to me. Identifier/name values are such a common, simple thing in web APIs already and would be relatively straightforward to add here.

Anyway, if there is time I'd like to discuss this proposal at one of the TPAC meetings this being discussed in #1536.

youennf commented 3 years ago

Taking stateful things that used to have an immutable relation with a registration and now allowing them to move seems to add a lot of complexity. In particular, you would have to move the job queue while you are in the middle of a job running from that queue. This does not seem like a good design option to me.

Sorry, I was not clear. What I was trying to describe is whether we could remove entirely the scope to job queue map. Instead you would get the job queue by first getting the registration. The registration to job queue relationship would remain immutable.

Allow the scope to registration map to have multiple keys pointing to the same registration

I'm not sure how this helps to be honest. I guess you are saying allow "old scope" and "new scope"

It was not really old scope vs. new scope but more a way of implementing your idea of a registration with a list of scopes.

Anyway, if there is time I'd like to discuss this proposal at one of the TPAC meetings this being discussed in #1536.

Sounds good to me.

wanderview commented 3 years ago

It was not really old scope vs. new scope but more a way of implementing your idea of a registration with a list of scopes.

Yes, I agree implementations will need a lookup index from each individual scope to the registration to perform navigation url matching.

The spec, though, needs a unique, immutable key in various places to lookup registrations. For example, you can have multiple jobs in the queue with keys baked into the structures to lookup registrations. You can't just hold registrations instead because the registrations may be removed before the job runs. If you try to use the scope as a key the scope could change before the job executes. Therefore I don't think we can use scope here without a lot of added complexity to detect and handle these edge cases.

youennf commented 3 years ago

Therefore I don't think we can use scope here without a lot of added complexity to detect and handle these edge cases.

Interesting. Identifiying these edge cases seem useful. If the issue is about job->job queue, could we pass the job queue as a parameter to 'Create job' so that we set a [job queue] slot for each job?

wanderview commented 3 years ago

Identifiying these edge cases seem useful.

I'm not sure I agree. I've already done the work to write a PR that meets the use cases in a more straightforward manner without these edge cases. I'm not sure what benefit there is to justify the cost of working through every edge case of a more complex approach when we have a simple solution at hand already.

wanderview commented 3 years ago

This was discussed on the November 2020 virtual call. I believe there was positive support from Mozilla, Microsoft, and LinkedIn.

wanderview commented 2 years ago

FYI, I'm finally starting work on implementing this.

w3c / ServiceWorker

consider allowing a non-scope identifier for registrations #1512