Open jakearchibald opened 5 years ago
@jakearchibald
We could also add things like { url: { pathStartsWith: '/foo/' } } to match against particular components of the URL, but you'd be matching against all origins unless you specify otherwise.
I think that's fine because sometimes we care about checking the domain and we don't at other times. Here's some code that's more-or-less exactly what I'm doing on a real-world service worker.
self.addEventListener('fetch', function(event) {
const request = event.request;
event.respondWith(
caches.match(request)
.then(function(response) {
if (response) {
return response;
}
const url = new URL(request);
if(url.pathname == '/some_path') {
const redirectUrl = new URL(url.href);
redirectUrl.pathname = '/';
return Response.redirect(redirectUrl.href,302);
}
if(url.hostname == IMAGE_CDN_HOSTNAME) {
return fetchFromImageCdn(request);
}
return fetch(request);
})
);
});
function fetchFromImageCdn (request) {
// basically fetch once and cache forever
}
Technically, both https://mywebsite.foo/some_path
and https://myimagecdn.foo/some_path
will both redirect to /
, but I know that the page will never try to access https://myimagecdn.foo/some_path
so it's fine that both match.
I'd like to use declarative routing to do something like this
// Things that try the cache, then go to the network
const cacheThenNetwork = [RouterSource.cache(), RouterSource.network()];
// These paths match all domains, but I'm only using one so it's fine
router.get(RouterCondition.url.pathname.is('/'), cacheThenNetwork)
router.get(RouterCondition.url.pathname.startsWith('/static/'), cacheThenNetwork)
// Any path on the image CDN
router.get(RouterCondition.url.hostname.is(MY_IMAGE_CDN), cacheThenNetwork)
// fetch event listener can now be simplified
self.addEventListener('fetch', function(event) {
const request = event.request;
const url = new URL(request.url);
if(url.pathname == '/some_path') {
const redirectUrl = new URL(url.href);
redirectUrl.pathname = '/';
event.respondWith(Response.redirect(redirectUrl.href,302));
}
});
but I know that the page will never try to access
https://myimagecdn.foo/some_path
so it's fine that both match.
That seems really fragile vs:
if (url.origin === location.origin && url.pathname == '/some_path') {
// …
}
I wouldn't want developers to be matching a path on all origins unless it's explicit.
You're right; that would be better. My point is not so much that it's a good thing to have a path matching multiple domains, but having something like pathStartsWith
isn't necessarily creating a new problem.
I know I could get my example to work using RouterIfURLStarts
and RouterIfURL
(or whatever equivalent API you decide to go with), so I think the conditions in your current draft are sufficient for a v1, but having conditions for each part of a request (href
, protocol
, hostname
, pathname
, search
, method
, and headers
) would be nice to have.
To go even deeper, I'd love to be able to do something like
router.add(
new RouterNot(new RouterConditionHasCookie('I_AM_LOGGED_IN')),
new RouterSourceCache({ request: '/login.html' })
)
After thinking about it, I think I'm going to go with the object based approach & @domenic's suggestion:
// Without options:
router.add(
conditions,
'cache',
);
// With options:
router.add(
conditions,
{ type: 'cache', request: '/shell.html' },
);
// Multiple sources:
router.add(
conditions,
[
'cache',
'network',
{ type: 'cache', request: '/shell.html' },
],
);
I'm still not keen on having objects with a required property that determines the type of the object, but it seems better than the alternatives.
With the object based approach, how to you specify that you want 'cache' or 'network' as race?
RouterSourceFirst(...sources), RouterSourceRace(...sources) etc etc.
@WORMSS
router.add(
conditions,
{ type: 'race', sources: […sources]},
);
The pattern feels equally extensible.
fwiw, @domenic and others expressed much better what I briefly tried to say yesterday in some other venue (twitter maybe?) - I like this recent turn though quite a bit.
The option explored in https://github.com/w3c/ServiceWorker/issues/1373#issuecomment-452247362 feels more straightforward and in this case better than using classes, would definitely prefer this version!
I'd like to suggest a somewhat different approach that I feel addresses the same issue in a more intuitive manner, which is to provide a (virtual) file system to code against the assets as they will be on the client and in the shape/structure they will actually be in. IE:
@jeremy-coleman The web isn't compatible with a filesystem. It's compatible with a request/response store, which is what the cache API provides. However, this alone can't communicate when a request should go to the network. I guess I don't understand your proposal fully.
I guess from my perspective, the cached files are a file system. Instead of (req res) => fetchHttpRoute => interceptEverything someLogic => if(cache) else(usenet) => next
Req res => fetchCacheRoute => if(!nocache) fetchHttp => next
If you explicitly write the api to query the cached files first you can just drop the intercept all together.
Similar to how you might conventionally write an image element as <|img src=assets/icon.svg/> because you know the location post-build, the 2nd order of that idea would be to write the src as src=clientCache
@jeremy-coleman in your system, how do you express "For any URL path ending .jpg
, try to fetch from the network, otherwise fall back to this generic image…"?
it'd be the same for both online and offline. Assuming all routes point to the cache first - if it's an online asset, replace with net when available. same for offline, just the timeout for the cache with online data would be 0ms / fetch onSomeUserEvent compared to something like a 24 hour reset for offline stuff. I think a harder thing to make understandable would be differences between something like 'use last successful' vs 'use constant fallback' , for the stuff above on how to handle conditionals, Proxy.revoke() with your conditional checks on propkeys access could probably handle everything needed.
for both online and offline from the examples above, i think something like this:
router.get(
new RouterIfURLPrefix('/**/*.jpg/'), // find some URL ending in jpg'.
new RouterSourceCache(),//use the static asset.
maybe
new RouterSourceNetwork(), //Try to fetch the request from the network. <-- this doesn't need to come first for online reqs , just use the current value in the cache first and always
maybe update static asset with req as new default fallback
somewhat unrelated but really what underpins my line of thinking is that I feel like offline apps should basically be completely downloaded into some form of local storage on 1st req, and the SW routes should support coding against the offline assets more-so than online sources.
@jeremy-coleman based on your example above, I don't understand the difference between your proposal and mine.
A bit late to the party but...
+1 for the "side effect operation" issue raised by @jeffposnick and @nhoizey
In addition to refreshing the cache based on a network source, firing analytics beacons is another feature that would be heavily used by LinkedIn. We have a lot of instrumentation to measure how the service worker is working/not working. The highest priority V2 feature for us would be support for a routingcomplete
or similar event, to handle both of these cases.
@jakearchibald
I think I'd make this an option to RouterSourceNetwork, as it's the only one that would benefit from a timeout right now.
I think the timeout feature would be useful for RouterSourceFetch
, as it might address https://github.com/w3c/ServiceWorker/issues/1292.
Timeout support would be really nice for creating a global fallback "catch all" handler.
router.get(
new RouterIfURLStarts(‘/profile/*‘),
[new RouterSourceCache(), new RouterSourceNetwork({ timeout: 10000 })]
);
router.get(‘*’, new RouterSourceCache(‘/oops.html’))
re: glob matching, one use case that may not be covered without regex is the "match on any route, but no files" use case.
So if you want to match /profile/123
or /profile/123/details
, but not profile/123/photo.jpg
.
@n8schloss something I wanted to double check about this proposal: A service worker's routes live with the service worker. They can't be changed without shipping a new service worker. Does that work for you?
I ask because with navigation preload, you wanted to change things during the life of a service worker (the header value).
@asakusuma it depends on the exact scope you want to match... and exclude. To take one of easiest syntax to express and combine facts:
url: { startsWith: '/profile/123', ignoreSearch: true,
and: {
not: {
url: { endsWith: '.jpg', ignoreSearch: true},
},
},
}
Being able to exclude and use boolean combinators is essential here to really express use-cases that are not the most simple ones. Also, syntax-wise,
and: { url: { startsWith: '/profile/123', ignoreSearch: true},
not: { url: { endsWith: '.jpg', ignoreSearch: true} }
}
Looks better.
@n8schloss something I wanted to double check about this proposal: A service worker's routes live with the service worker. They can't be changed without shipping a new service worker. Does that work for you?
I ask because with navigation preload, you wanted to change things during the life of a service worker (the header value).
Yep! As long as there's the RouterIfDate options then our use case here will be met :)
Late to the party but I'd like to give a bit of feedback.
I love the general concept. Seems like this will solve a lot of problems (speed issues with spinning up SWs, and the added "scariness" of what if the fetch handler has a bug that makes my site non-updateable.) This came up as a potential solution to w3c/manifest#774.
I have some superficial criticism (API surface details).
startsWith
and endsWith
things in favour of just having a glob syntax. Fewer API calls, less verbose syntax, and more flexible.ignoreSearch
is a confusing name (why not ignoreQuery
)? Alternatively, just embrace the glob and allow ?*
at the end of the glob to mean "ignore the query" (not as an especially special case, but rather as a general rule, treat a '?' at the end of a path with nothing after it as the same as no '?'.router.get
is confusing because a method called "get" implies it's going to return some information out of the router, not set up a new route. I'd rather just remove that method and just make you write router.add({method: 'GET'})
which is nice and obvious what it's going to do.Ignore search is because it's what the JavaScript code calls it In window.location
- I'd like to remove the
startsWith
andendsWith
things in favour of just having a glob syntax. Fewer API calls, less verbose syntax, and more flexible.
This has come up a few times in this issue. I'm not against adding it at some point, but it feels like a big contentious thing to standardise for v1.
ignoreSearch
is a confusing name (why notignoreQuery
)?
As @WORMSS says, it's for consistency with the rest of the platform. url.search
, cache.match(url, { ignoreSearch: true })
etc etc.
- The name
router.get
is confusing because a method called "get" implies it's going to return some information out of the router,
Yeah, maybe. Although this is how most node routers seem to do it.
@WORMSS Yeah it's called that in the URL object too. It's called query inside the spec language but now that I re-read it, I realised that the word "query" never appears in the API interface itself. So "search" is fine.
Curious whether we could use @wanderview's URLPattern proposal instead of startsWith
/endsWith
.
FYI, we are about to implement a subset of the API. https://github.com/yoshisatoyanagisawa/service-worker-static-routing-api
Thanks for your work! FYI: I tweeted about it to get the developer community ready for experimenting :) https://twitter.com/webmaxru/status/1664214464999182338
As commented in https://github.com/w3c/ServiceWorker/issues/1373#issuecomment-1569530880, we are about to implement a subset, as ServiceWorker Static Routing API. https://github.com/yoshisatoyanagisawa/service-worker-static-routing-api
We (Google Chrome team) are ready for the Origin Trial, it's available from M116. In the meantime, we'd like to hear opinions around how InstallEvent.registerRouter()
should work in this API.
Unlike add() or get() in the original Jake's proposal, registerRouter()
sets all the routes at once, and it can only be called once. This is for ease of understanding the latest routes. However, we may rethink this limitation because we saw some interests in using this API from 3rd parties. 3rd parties in this context mean the SW scripts which are served from cross origins, and imported by the main SW script via importScripts() or ES modules.
Like Speed Kit, some companies provide ServiceWorker scripts as SDK, and customer websites use their scripts via importScripts(). If both a 3rd party provider and a customer want to use registerRouter(), in our current implementation the browser only registers the routing info called for the first time, and throws type error for the second registerRouter() call.
I personally feel it makes sense that 3rd parties use registerRouter() to handle something, but do you think we should support multiple registerRouter() calls? If so, a naive approach is just adding a new routing rule into existing rules, but do we need additional mechanisms or API surfaces to manage registered routes more smartly?
cc: @ErikWitt
I think allowing multiple calls is fine. In that case I would rename the method from registerRouter()
to registerRoutes()
.
The naive approach makes perfect sense to me. In other words, registerRoutes(a); registerRoutes(b);
should be equivalent to registerRoutes([...a, ...b]);
. I think that is what everyone would expect.
Given that, is there a benefit to registerRoutes
vs calling registerRoute
as many times as needed? One route at a time would make it easier to tell which route caused a throw.
Heythere, sorry for being so late to the discussion. My two cents:
Calling registerRoute from multiple handlers would be awesome to make combining multiple Service Workers easier. We have that case from time to time and it can be a complex task.
Calling registerRoutes outside the install event is off the table right? It's not a deal breaker but would have been convenient. At the moment out Service Worker loads a configuration from indexeddb (the service worker is generic, every customer has a unique config), so it would be convenient to configure the router outside the install event. That said, we are looking into inlining the config in our shipped service worker for every customer which would resolve that issue for us.
Would it be possible to make the origin trial a 3rd party origin trial? I.e. could send the origin trial token in the 3rd party script imported into the same origin service worker? If it is not a third part origin trial, our customers would need to implement the header themselves which is a lengthy process for large e-commerce companies.
btw. I love the use of URLPattern in the api :) Looking forward to trying this in production and reporting back on the performance gains
Thank you all for the feedback!
@jakearchibald That's a fair point. WDYT @yoshisatoyanagisawa ?
@ErikWitt
Calling registerRoutes outside the install event is off the table right?
At the moment we want to keep it inside the install event. We don't want to make the API dynamically configurable. If you have a workaround please consider doing so.
Would it be possible to make the origin trial a 3rd party origin trial?
OriginTrial for ServiceWorker features is running on a bit different mechanism, and unfortunately 3rd party origin trial is not supported yet. We understood the use case from 3rd party context. Let us consider how to deal with it, but please expect it takes some time to solve.
I came up with this scenario, perhaps it can be a problem for some cases.
a.com/sw.js:
importScripts('b.com/imported-sw.js');
addEventListener('install', (event) => {
event.registerRouter({
condition: {
// example.jpg is returned form the fetch handler.
urlPattern: {pathname: "example.jpg"}
},
source: "fetch-event"
});
})
b.com/imported-sw.js:
addEventListener('install', (event) => {
event.registerRouter({
condition: {
// All jpg resources are fetched from the network.
urlPattern: {pathname: "*.jpg"}
},
source: "network"
});
})
registerRouter()
in imported-sw.js is executed first, and then the one in sw.js is executed. In the current algorithm, the API simply has the list of registered routing info and try to evaluate from the first item. So any requests to jpg resources, including example.jpg
are matched with {pathname: "*.jpg"}
which is registered in the imported-sw.js and the routing info registered by the main sw.js is never used.
Do you think the API should have a mechanism to address this case which is introduced by allowing multiple registerRoutes
or registerRoute
calls? I think this is kind of a WAI behavior, but love to hear anyone's thoughts around it.
I think this behavior is WAI. If you call importScripts()
on a script before your own install
handler, you are giving that script priority to install routes.
If you would like to have your own code take priority, then you should rearrange sw.js
like so:
addEventListener('install', (event) => {
event.registerRouter(/* ... */);
});
importScripts('b.com/imported-sw.js');
It could be addRoutes(...routes)
.
I usually don't like that pattern, since it prevents adding new parameters in future, but if seems like any options would be per route.
add
is shorter than register
, and add
makes it clearer that it's additive to the previous call. It might not make it clear that previous routes are cleared with the installation of a new service worker, but I'm not sure register
makes that clear either.
Sorry for being late to the discussion. When I wrote the explainer, I did not consider 3rd party SW providers to use the API. However, considering the use case, it makes more sense to allow the routes updated multiple times.
For the naming of the API,
I remember that we called it register
instead of add
to clarify it is a write once operation. Note that Jake's original proposal called it add
and allowed to call it multiple times. If we allow it to be called multiple times, I prefer to rename it add
or append
instead, which sounds more like the rules can grow.
By the way, I have concerns on calling the API multiple times,
1) Is the order of the API call guaranteed? The current registerRouter
is an asynchronous API. When the multiple rule has been installed via the API, how should the order of the routes be? I am not so much familiar with JavaScript, but if JavaScript may not wait for the previous API call, then the order of rules may flip?
2) I also have the same concern as @sisidovski. If somebody sets a rule that has an intersection with other rules, it may interfere unexpectedly. Moreover, if somebody sets a rule like:
{
condition: {}, // i.e. matches everything.
source: "fetch-handler"
}
all rule updates after the rule would be just ignored. I guess it would be fine to be WAI if the order of rules is guaranteed.
Calling registerRoutes outside the install event
It is intended to make the rule updated only inside the install event. We were concerned about the difficulty of understanding rule applications to inflight requests.
Would it be possible to make the origin trial a 3rd party origin trial?
FYI, https://bugs.chromium.org/p/chromium/issues/detail?id=1471005
is there a benefit to registerRoutes vs calling registerRoute as many times as needed?
Current API accepts both sequence of routes and a route. I suppose you suggest only accepting a route. I feel it is fine to allow both ways of writing a rule. i.e. sequence of rules and a rule. For those who want to understand an error, they can write a rule one by one. For those who want to write a rule at once, it is also allowed.
Will you elaborate more on why you want to prohibit accepting a sequence of rules?
Just FYI, the issues on the static routing API have been filed in https://github.com/WICG/service-worker-static-routing-api/issues. Please take a look. Also, I created an issue for "Should we allow registerRouter() to be called multiple times?" as https://github.com/WICG/service-worker-static-routing-api/issues/10 for ease of focusing on this discussion.
FYI, we are discussing on the way to handle an empty "not" condition in https://github.com/WICG/service-worker-static-routing-api/issues/22. We are leaning on raising. Please feel free to share your opinions.
FYI, we are currently actively discussing extending the resource timing API to support static routing API. https://github.com/w3c/resource-timing/issues/389
Here are the requirements I'm working towards:
I'm going to start with static routes, and provide additional ideas in follow-up posts.
The aim is to allow the developer to declaratively express a series of steps the browser should perform in attempt to get a response.
The rest of this post is superseded by the second draft
Creating a route
WebIDL
JavaScript
The browser will consider routes in the order declared, and will consider route items in the order they're given.
Route items
Route items fall into two categories:
Sources
WebIDL
These interfaces don't currently have attributes, but they could have attributes that reflect the options/defaults passed into the constructor.
Conditions
WebIDL
Again, these interfaces don't have attributes, but they could reflect the options/defaults passed into the constructor.
Shortcuts
GET requests are the most common type of request to provide specific routing for.
WebIDL
Where the JavaScript implementation is roughly:
We may also consider treating strings as URL matchers.
router.add('/foo/')
===router.add(new RouterIfURL('/foo/'))
.router.add('/foo/*')
===router.add(new RouterIfURLPrefix('/foo/'))
.router.add('*.png')
===router.add(new RouterIfURLSuffix('.png'))
.Examples
Bypassing the service worker for particular resources
JavaScript
Offline-first
JavaScript
Online-first
JavaScript
Processing
This is very rough prose, but hopefully it explains the order of things.
A service worker has routes. The routes do not belong to the registration, so a new empty service worker will have no defined routes, even if the previous service worker defined many.
A route has items.
To create a new route containing items
Handling a fetch
These steps will come before handling navigation preload, meaning no preload will be made if a route handles the request.
request is the request being made.
RouterIfMethod
, then:RouterIfURL
, then:RouterSourceNetwork
, then:RouterSourceCache
, then:RouterSourceFetchEvent
, then:Extensibility
I can imagine things like:
RouterOr(...conditionalItems)
– True if any of the conditional items are true.RouterNot(condition)
– Inverts a condition.RouterIfResponse(options)
– Right now, a response is returned immediately once one is found. However, the route could continue, skipping sources, but processing conditions. This condition could check the response and break the route if it doesn't match. Along with a way to discard any selected response, you could discard responses that didn't have an ok status.RouterCacheResponse(cacheName)
– If a response has been found, add it to a cache.RouterCloneRequest()
– It feels likeRouterSourceNetwork
would consume requests, so if you need to do additional processing, this could clone the request.But these could arrive much later. Some of the things in the main proposal may also be considered "v2".