Allow the NavigationController to manage resources on first load

jansepar commented 11 years ago

Copied from my post on the discussion on chromium: https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/Du9lhfui1Mo

Just found this, and it seems extremely interesting and has lots of potential to be a very useful addition to browsers. I was disappointed to read this bit though:

"The first time http://videos.example.com/index.html is loaded, all the resources it requests will come from the network. That means that even if the browser runs the install snippet for ctrl.js, fetches it, and finishes installing it before it begins fetching logo.png, the new controller script won't be consulted about loading logo.png. This is down to the first rule of Navigation Controllers"

I think there is a lot of value that can come from giving developers the power to have full control over resource loading, even on the first load. For example, having the ability to swap image URLs before they are kicked off by the preloader would be a big win for responsive images. I am the author of the Capturing API (https://hacks.mozilla.org/2013/03/capturing-improving-performance-of-the-adaptive-web/) which provides this exact functionality in a non-optimal way. In order to control resource loading with Capturing, we must first buffer the entire document before being able to manipulate resources, which is a bummer, but it's ability to control resources on the page is very, very useful. If the Navigation Controller worked on first page load, the need for Capturing would be eliminated.

It does not seem like total control of resource loading is the goal of the Navigation Controller, but the API is very close to being able to provide exactly that without much change at all. I would love to have a conversation about whether or not adding this functionality is feasible!

michael-nordman commented 11 years ago

Sounds like you're looking for a means to block page load until the 'controller' is up and running on first load.

Some of us had talked about an option like that in the registration process at some point. I think it was dropped mostly as a matter of reducing the scope for the sake of clarity more than a fundamental problem with it. At the time of those discussion we had envisioned a header based registration mechanism such that the body of the initial page itself was re-requested thru the controller once it was up and running.

alecf commented 11 years ago

One option is something like this, which is slightly underspecified right now:

navigator.registerController("/*", "controller.js")
    .then(function(controller) { 
      if (...controller came online for the first time) { 
          // maybe warn the user first
          document.reload() 
    });

jansepar commented 11 years ago

@alecf with that mechanism, wouldn't that risk flashing some of the content that gets loaded before the controller is done loading? The document reload could be avoided if the script was blocking. Or if somehow the browser was aware that a controller was going to load, and it could block the load of the next resource until the controller is finished.

michael-nordman commented 11 years ago

The way alec pointed out is pretty close approximation and it would result in the main page load also being routed thru the controller for the reload.

Being browser developers, we're understandably reluctant to introduce things that involve "blocking page loads" :)

igrigorik commented 11 years ago

We're waging a war in the webperf community to rid "blocking resources" whenever and wherever possible... I would upgrade "reluctant to introduce blocking resources" to something much, much stronger. First, we're stuck on a controller download, then on parsing and eval, and then on potentially slow routing calls for each request -- ouch x3.

jansepar commented 11 years ago

Copied my comment from discussion on https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/Du9lhfui1Mo

So, while there are performance implications of giving developers full control of resource loading, I really think it's the best solution going forward. One thing we all have to stop and realize is that when developers want to control resources, they manage to do it - just in non-optimal ways that also have big performance implications. One solution developers have and use is proxying pages and making modifications to resources before hitting the client, which can have big security implications, and does not have the advantage of making decisions based on device conditions. Another option developers have and use is writing alternates to src and href (such as data-src and data-href) and loading them after the DOM content is loaded, thus needing to wait for rendering to complete before loading resources. Another option is Mobify's Capturing API, which also blocks page rendering.

So when thinking about giving the Navigation Controller the power to control resources on first load, its not a matter of blocking vs no blocking, its a matter of blocking vs the 3 options previously listed.

noahadams commented 11 years ago

Hi guys, I'm a colleague of Shawn's at Mobify, and I thought I'd butt in here because this API is really intriguing to me:

Not providing an API that allows full control will just lead people to using the reload() workaround posted above, which is clearly at least as bad as blocking page rendering on the controller download, if not worse, because after reload a different set of resources could potentially be downloaded.

This API is already potentially quite "performance dangerous" (not to mention "basic functionality dangerous") in the sense of providing a very deep hook into resource scheduling, far in excess of what's previously been available, but the most likely application is in fact performance improvement, e.g. the Application specific caching rules as presented in the groups thread linked above, or choosing device appropriate resources for various devices, and making those decisions (and starting those downloads) as early as possible.

I haven't dug too deeply into the API itself yet, but would it be hypothetically possible to throw a "bootstrap" controller into a page inline to overcome the "additional blocking resource" objection Ilya brought up?

junosuarez commented 11 years ago

Would it be a terrible idea to have some sort of {prefetch:false} flag on a per-scope? This would allow prefetch to be the default (and more performant) action, but allow developers to override it in scenarios where more explicit control is necessary or desired.

alecf commented 11 years ago

@noahadams - perhaps flip this around - I'm not sure reload is "at least as bad as blocking" - from my perspective, blocking is the worst possible option, because it effectively prevents ALL resource loads and introduces complicated behavior. Since you can virtually emulate the blocking behavior with interstitial content while the page loads, I can't see a good reason to introduce blocking.

From mobify's own blog about 'm.' sites and redirects:

With 3G latencies, you're looking at up to 2 seconds for the redirect to an m-dot to complete. The visitor to your mobile site will count this against you, which negatively affects their impression of your site and reduces the likelihood of them returning. With delays as small as 500 milliseconds resulting in a 1.9% conversion penalty, performance is money.

This situation is worse, because you actually have to start reading and parsing html and multiple resources before the pageload can continue.

A few examples...

What happens here:

<img src="foo.png">
<script> navigator.registerController("/*", "controller.js", {blocking: true})</script>
<img src="bar.png">

Do we block loading of "bar.png"? is foo.png visible on the screen?

what about this:

<img src="foo.png">
<script> navigator.registerController("/*", "controller.js", {blocking: true})</script>
<script src="http://somecdn.com/jquery.min.js">

Is that script loaded before or after controller.js? when is it evaluated?

What if it takes 2 seconds to get controller.js?

To me these examples demonstrate that there is no way any web platform API is will ever support a method that blocks the main thread, especially one dependent on a network request. document.write() was bad enough, this is far worse. Further, a properly designed website could immediately put up an interstitial message, "Loading resources.." or what have you, if your site truly is non-functional without the controller.

jansepar commented 11 years ago

@alecf the Navigation Controller doesn't have to block the rendering thread in all cases, it just has to block resources from loading. Say for example, you had a document like this:

<html>
<head>
<script> navigator.registerController("/*", "controller.js", {blockResourcesLoading: true})</script>
</head>
<body>
<img src="a.png">
<h1>Foo</h1>
<img src="b.png">
<h1>Bar</h1>
</body>
</html>

I would imagine in this case, Foo and Bar would render regardless of whether or not the controller was finished, and only the images would be delayed from loading until the controller was finished downloading. When the controller is finished loading and we have the instructions, the images could then start downloading.

Now, if you had an external script tag in the head placed after the controller, like this...

<html>
<head>
<script> navigator.registerController("/*", "controller.js", {blockResourcesLoading: true})</script>
<script src="jquery.js"></script>
</head>
<body>
<img src="foo.png">
<h1>Foo</h1>
<img src="bar.png">
<h1>Bar</h1>
</body>
</html>

...then yes, I would envision that the main rendering thread would be blocked, because loading jquery would be delayed waiting for the controller to finish loading, and well, external scripts block rendering. But scripts in the head block rendering anyways - and we all know the best practice is to throw scripts at the end of body. Therefore if developers follow that best practices, there would be no blocking of the main rendering thread even if the Navigation Controller behaved as I'm suggesting. The real performance loss here is that the preparser/preloader would be delayed until the controller is finished loading.

As for "what if it takes 2 seconds to download controller.js", based on the spec, it seems as though the controller wouldn't get large enough to take that long to download... Of course, its possible.

Once again, I just want to emphasis that in order to solve the responsive image problem, people already are blocking resources - just in different ways. Some are using proxies to rewrite resources, some are changing src attributes and loading images at the end of rendering - neither of these are optimal. Aside from giving users full control over resource loading, I can't think of a better alternative to solve the responsive image problem.

igrigorik commented 11 years ago

Once you have a hammer, everything looks like a nail.

We don't need Navigation Controller to solve the responsive images problem. The responsive images problem needs to be solved via appropriate API's - srcset, picture, client-hints, etc. The argument that NC "doesn't have to" block rendering is not practical: most every page out there (unfortunately) has blocking CSS and JavaScript, so the end-result will be that we block ourselves not only from rendering, but also from fetching those resources ahead of time. In Chrome, just the lookahead parser gives us ~20% improvement [1]. Further optimizations with aggressive pre-connects, pre-fetch, etc, will help us hide more of the mobile latency.

[1] https://plus.google.com/+IlyaGrigorik/posts/8AwRUE7wqAE

Also, since we've already had a lengthy discussion on this before. As a reference: https://plus.google.com/117340630485537362092/posts/iv3iPnK3M1b https://plus.google.com/+IlyaGrigorik/posts/S6j45VxNESB (more discussions here)

I completely understand your (Mobify) motivation to have NC controller override all browser behavior -- you've built a business in rewriting poorly implemented sites into something more mobile friendly.. But let's face it, the actual answer is: you shouldn't need a proxy layer here, the site should be rebuilt to begin with. Adding more proxies can make this type of work easier, but it won't give us the performance we want (yes, it means the laggards have to manually update their sites).

tl;dr: let's keep responsive images out of this discussion.

jansepar commented 11 years ago

First I just want say that I hope it's clear that I really appreciate the fact that we are all trying to come up with great ideas to benefit the web, and I think that it's pretty awesome that we can do it collaborative and open forum like this :)

I think if the overall goal is to do everything that we can to improve the performance of the web, then I don't think we should be limited to hoping that laggards will manually update their sites. Automated tools are a very scalable way of achieving the goal of making the web faster. Google's own Pagespeed Service is a great example of this - it's not an optimal solution since pages must be routed and proxied through Googles servers, but it can definitely significantly improve the performance of most websites. I liked something you said in one of our earlier discussions on G+:

"My only knit-pick is the "we will all benefit from another demonstrably effective technique to consider". If we qualify that with a bunch of caveats, like "on some sites and in some cases, your mileage will vary, and still slower than what the platform could deliver, assuming it implemented the right features".. then we're good! That's all. :-)"

Even if we couldn't figure out a way to give developers the ability to have full control of resource loading without incurring a penalty, I still think it's a worthwhile feature that could be very useful to create automated tools to help speedup the web without needing to educate every single developer on the rules of performance. Then like you said, as long as we indicate that "your mileage will vary, and (your site is) still slower than what the platform could deliver", and as long as we can slowly educate them afterwards on how to take advantage of the platform, then we are good :)

And one note for responsive images: I have a few gripes about picture and srcset, but I won't list them here. client-hints seem very promising, but I have some issues with it that are more philosophical.

noahadams commented 11 years ago

@alecf I suppose you're right about the reload() workaround approach to blocking behaviour being at least less complicated (though I have my own UX gripes about loading interstitials in pages and in browsers, I won't raise them here).

My one concern about using it would be the case of a stateful transition between origins (that is to say, a cross-domain POST), though I'll admit that that's an uncommon edge case.

I think there's an argument to be made that a blocking version of this would have blocking semantics similar to a blocking <script src="..."> tag, that is to say you expect parsing to stop until it has finished being evaluated and to see its side effects after it has loaded. A sane interaction with the pre-load scanner is another issue.

What about the potential for bootstrapping a controller inline with enough logic to "correctly" load the current page and using the "upgrade dance" later to install something more full featured?

alecf commented 11 years ago

I think there's certainly something interesting in notion of inline controllers for bootstrap... it sounds like we should file a new issue for that suggestion. I'd be interested in hearing this thought out in particular (file a new issue, discuss these things there...)

1) what if two new tabs both open around the same time - and both have inline controllers. Can one page affect another? 2) What if you have an inline controller, but also have a running controller, who wins? 3) Is the inline controller persistent in any way? If I load another page that doesn't refer to a controller, is it affected by the inline controller?

igrigorik commented 11 years ago

@jansepar we may be getting off topic here, but we shouldn't be putting in large new features which will guarantee a significantly degraded performance -- blocking all resource downloads is a big red flag, and then there is still the question of overhead per lookup. Besides, you basically do this already with your implementation, so it's not clear what you would win here.

jansepar commented 11 years ago

@igrigorik I think the potential for an inline controller could be a good compromise that gives resource control on first load without degrading performance! Looking forward to seeing what comes out of that discussion which should be opening in a separate ticket. In regards to my implementation vs doing it with NC - there would definitely be some big performance wins of controlling resources via an API (NC), rather than capturing the entire document.

igrigorik commented 11 years ago

@alecf @jansepar is there such open issue I can track? Can't seem to find anything..

Also, #54 looks to be related.

jansepar commented 11 years ago

@igrigorik @noahadams is planning on creating an issue for being able to bootstrap the controller inline.

FremyCompany commented 10 years ago

I think some applications may want to have some kind of always-on service worker. I see a way to allow that without hurting the page performance too much: via an http header.

Service-Worker: /service-worker.js

This enables no-performance-loss installation:

(a) you don't need to parse the page to find out you need a service worker and
(b) if the service worker is sufficiently small you can use an HTTP/2.0 push to deliver the file to the browser directly; avoiding any RTT loss.
(c) http headers are not possibly subject to XSS attacks

alecf commented 10 years ago

The header is an interesting idea, but it's always on once you register it.. in fact since there is no pattern here, I don't see a way to register it in the general case.. you'd at least need to change the header to

Service-Worker: /service-worker.js, /*

But I'm still not convinced that this helps enough to justify a whole new header.

Just to be clear: the usecase that is covered by this is one where the user loads a page, and then goes "offline" before that page is reloaded.. and you don't want to then reload offline, right? All the other ways of using this (responsive images on first load, etc) are beyond the scope of Service Worker design, even if people try to use Service Workers to solve them.

igrigorik commented 10 years ago

@FremyCompany "this enables no-performance-loss installation" is not true. Stuffing the controller reference into an HTTP header may speed things up just a tad - by definition, headers will arrive first, allowing the browser to start fetching the controller - but it still does not address the problem of having to block dispatch of all other resources until the controller is loaded.

@alecf agreed, don't think the header adds much here.

FremyCompany commented 10 years ago

@igrigorik The advantage of headers is that you don't have to wait to parse the page, and also that the header is only sent once over HTTP 2.0 because of header compression. You don't pay the cost of inlining multiple times.

Regarding the blocking resource issue, this is a developer issue. If the developer need something it will achieve it anyway; for example by putting in an HTML comment all the HTML and waiting for the ServiceProvider to be loaded to reload the page; then extract the HTML from the comment on DOMContentReady. That will do the same thing, only slower.

Also, do not forget that we are not forced to apply the service worker to all URLs; we ca restrict to some element only which may still leave the page usable in the mean time.

igrigorik commented 10 years ago

The advantage of headers is that you don't have to wait to parse the page, and also that the header is only sent once over HTTP 2.0 because of header compression. You don't pay the cost of inlining multiple times.

Yep, that is true.

Regarding the blocking resource issue, this is a developer issue. If the developer need something it will achieve it anyway; for example by putting in an HTML comment all the HTML and waiting for the ServiceProvider to be loaded to reload the page; then extract the HTML from the comment on DOMContentReady. That will do the same thing, only slower.

Everything is a developer issue if you get the API wrong. Perhaps the header is a reasonable solution, but this point alone is not sufficient as an argument for it.

Also, do not forget that we are not forced to apply the service worker to all URLs; we ca restrict to some element only which may still leave the page usable in the mean time.

That's true, but practically speaking, if you actually want to take your app offline, that's not the case, is it? As opposed to just using NavController to intercept a few different requests and rewrite them.. As such, I would fully expect most people to just claim "/*".

FremyCompany commented 10 years ago

I think you are right about /* but to he honest I'm still hoping that some "critical" resources can be put into an improved appcache instead, allowing those resources to be kept offline longer and bypass the service worker.

The amount of such resources, being limited and rarely changed, should be sufficiently low to be managed by hand.

That's the hope at least...

piranna commented 10 years ago

I think you are right about /* but to he honest I'm still hoping that some "critical" resources can be put into an improved appcache instead, allowing those resources to be kept offline longer and bypass the service worker.

Maybe AppCache and ServiceWorker cache could be combined? It's clear that regarding resource fetching and caching there's some overlapping...

alecf commented 10 years ago

There is absolutely no way we're combining AppCache and ServiceWorker - if anything I expect that using them together will result in several developers feeling so bad about themselves for trying that they give up on web development entirely, and write native apps as a penance for their sins.

I think we need to get back to the issue at hand which is the attempt to "go offline" during the initial load of the document, the first time its ever seen by the browser. This is only the very first time - registration is persistent across future pageloads and even browser restarts!

We're jumping through hoops to avoid this:

navigator.registerServiceWorker("/*", "service-worker.js").then(function() { window.reload(); })

or alternatively

if (!navigator.serviceWorker) // no service worker registered yet
    window.location = "/installer";

Or something similar. I just can't see introducing anything that would block all resources from loading the first time a user visits a page. The browser will just sit there with a blank white page spinning/progressing until the service worker is downloaded and started. A developer who did that would essentially be saying "I want my web page to suck for 10-30 seconds for all first time visitors" - if your site is really that heavily dependent on the service worker, you WANT some kind of "installer" or progress meter to give feedback to your users, so they don't just hit the "back" button and never visit your site again. (like the god-awful but unfortunately necessary progress bar that gmail has)

piranna commented 10 years ago

There is absolutely no way we're combining AppCache and ServiceWorker - if anything I expect that using them together will result in several developers feeling so bad about themselves for trying that they give up on web development entirely, and write native apps as a penance for their sins.

Well, maybe it's because that I'm mainly a systems programmer (I used to program in OpenGL and also wrote my own kernel...) and now I'm a Javascript and networks programmer just by serendipity :-)

I think we need to get back to the issue at hand which is the attempt to

"go offline" during the initial load of the document, the first time its ever seen by the browser. This is only the very first time - registration is persistent across future pageloads and even browser restarts!

We're jumping through hoops to avoid this:

navigator.registerServiceWorker("/*", "service-worker.js").then(function() { window.reload(); })

In some way I asked before about combining ServiceWorker and AppCache and in a previous message with the browser system cache due to this thing. If we have already downloaded some resources and they are available in AppCache or browser system cache, why it's needed to be reloaded the page so the ServiceWorker can be aware of it when maybe it would be enought internally with just setting a "ServiceWorker flag" or maybe better just do a hard link of the resource from the AppCache or browser system cache to the ServiceWorker cache and start to manage it from there? This will fix the problem about needing to do the reload. Or if you don't like it, since a web page knows what are the resources it has downloaded (you just need to go to the Chrome Inspector > Network tab to see them), why not just re-do the fetch of the already downloaded files? This would also prevent to do a full page reload, and combined with the previous sugestion (link on the ServiceWorker cache the already downloaded files), if the service worker doesn't need to do anything with the files or fake them they will not be needed to be downloaded at all.

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux

alecf commented 10 years ago

wait, think about what you're really asking though:

If we have already downloaded some resources and they are available in AppCache or browser system cache, why it's needed to be reloaded the page so the ServiceWorker can be aware of it

Putting aside appcache for a moment: if they are available in the system cache and are fresh, then you don't need a service worker present to be aware of them. If the service worker is registered and requests its cache be populated, then that mechanism is really a function of the browser implementation of the SW cache - if the implementation is written such that it can just refer to the existing, fresh data in the system cache from the SW cache implementation, then it won't have to re-download those resources when the SW is instantiated.

I don't really see what this has to do with having SW loaded in the first invocation of the page - it sounds like you're more concerned about the transition from a non-SW-controlled page to a SW-controlled page, but trying to solve it by avoiding non-SW pages altogether.

piranna commented 10 years ago

wait, think about what you're really asking though:

If we have already downloaded some resources and they are available in AppCache or browser system cache, why it's needed to be reloaded the page so the ServiceWorker can be aware of it

Putting aside appcache for a moment: if they are available in the system cache and are fresh, then you don't need a service worker present to be aware of them. If the service worker is registered and requests its cache be populated, then that mechanism is really a function of the browser implementation of the SW cache - if the implementation is written such that it can just refer to the existing, fresh data in the system cache from the SW cache implementation, then it won't have to re-download those resources when the SW is instantiated.

Ok, just what I was asking for :-)

I don't really see what this has to do with having SW loaded in the first invocation of the page - it sounds like you're more concerned about the transition from a non-SW-controlled page to a SW-controlled page, but trying to solve it by avoiding non-SW pages altogether.

No, I'm concerned about the fact to reload the page so the SW is aware of all the content of the page, also downloaded one. Since first time I read about SW (just last week, maybe two weeks ago) I believed that since install the SW would be available, maybe leading to half-state pages, but I though being aware of it would be a good idea, for example loading an AppCache that install the SW and all the content is proccesed by the SW since then, no "please reload your application" or some flickering. Later has been shown the fact that maybe would be interesting that SW manage all the page content. Ok, you can register it on a inline script tag on the top of the page, problem is that the page itself wouldn't be managed until reload, so this is my point: since UA is aware of all the content downloaded by the page, why don't tell them to the SW, maybe re-doing the downloaded content request in background so the SW could be aware of them?

Hum, now that I think it, maybe this content request would also be done by the page itself (no needed support by the browser) using XHR calls if it knows what's the content that's being already downloaded (on the top-page-script-tag example would be only the html page, that can be get the url from window.location), but for the inline content it would need to allow the half-state page... :-/

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux

FremyCompany commented 10 years ago

@alecf Why would it be hard to have both a SW and an appcache? I seriously don't get it... I could totally use an appcache for my "/style", "/resources" and "/scripts" folders while requiring a ServiceWorker for the "/data" or "/api" folders. Then the website can load perfectly fine without SW if needed because the essential content wil rely on the appcache, while still providing a case-by-case caching functionnality for more variable and user-dependent content, and it is not a critical issue if that happens after a very small latency because the core content can be ready independtly.

By the way, it is totally false that the page will display blank while the SW is loaded, because a developer with a minimum of logic will make sure not to have blocking content until it has enough stuff going on to display its progressbar; or it will make sure the SW is fast, or more likely both. This 10-30s analogy is hyperbolic and totally misses the point. The SW may allow huge win on the wires by allowing finer content negotation; diff files and request prioritization that in the end may make the page appears to load faster even on first run.

igrigorik commented 10 years ago

@alecf https://github.com/slightlyoff/ServiceWorker/issues/73#issuecomment-25642879 is a good summary. One thing I would clarify: it's not just about "going offline". I have a feeling that many apps will use SW without ever leveraging the offline portion of it. SW provides a "scriptable proxy / router", and many sites will lean on it as such. What we want to mitigate, as you already pointed out, is the window.reload() case + boot screen.

To enumerate the options we've discussed so far:

Do nothing. We'll end up with Gmail-like boot screens -- not a great outcome.
Allow SW to be inlined into the document -- this seems reasonable, but requires API changes. This would mitigate the blocking concern since the bootstrap script is part of the doc itself. Yes, it means it would be duplicated across docs.. that's a known tradeoff: keep it small, etc.
If you're running over SPDY/HTTP 2.0, you could use server push and existing syntax to mitigate some of the inlining costs.

(3) is feasible today without any modifications to the spec, but it does seem like we should provide (2) -- it feels odd to force an asset to be an external resource only. Something like:

`navigator.registerServiceWorker("/*", function() { ... }).then(...)`

igrigorik commented 10 years ago

Expanding on previous question about inline workers.. What stops us from simply allowing something like:

<script id="serviceworker" type="javascript/worker">
    // inline worker, as defined in: https://github.com/slightlyoff/ServiceWorker/blob/master/explainer.md#a-quick-game-of-onfetch
    this.version = 1;
    var base = "http://videos.example.com";
    var inventory = new URL("/services/inventory/data.json", base);  
    // ... snip ...
</script>

<script>
  var blob = new Blob([document.querySelector('#serviceworker').textContent]);
  var worker = new Worker(window.URL.createObjectURL(blob));

  navigator.registerServiceWorker("/*", worker).then(
    function(serviceWorker) {
      console.log("success!");
      serviceWorker.postMessage("Howdy from your installing page.");
    },
    function(why) {
      console.error("Installing the worker failed!:", why);
    }
  );
</script>

Or, something close to that...?

FremyCompany commented 10 years ago

I'm still not convinced with inlining in HTML. This condition resource downloading to the detection of a JavaScript code execution. What do you do if the author did put a < script > to an external URI between the "javascript/worker" and its implementation? This is also a burden for to make sure all pages include this inlined script.

As a consequence, I'm still totally convinced that an HTTP Header is the best solution. HTTP2 Servers can reply with the url of the controller code + push it straight away if thought necessary, and HTTP1 server that want to perform inlining can return a data uri.

I'm a huge fan of pushing a domain controller at the HTTP level because, at the end of the day, a domain controller interacts with the server; this is not be so different from a "we support TLS" header: it's a request from the server to switch protocol, and add a layer in the communication channel.

I could totally see how some website provider would like to use a custom domain controller for all its websites (because they want to use some special protocol that supports more efficient compression, or whatever) without modifying the code of the sites hosted on the server.

jansepar commented 10 years ago

"What do you do if the author did put a < script > to an external URI between the "javascript/worker" and its implementation?"

I don't think it's a problem to simply specify that all requests that you want to control via the ServiceWorker must be placed below the API call to the service worker. Removing a feature that could be very valuable simply because someone might use it incorrectly doesn't seem like it makes sense.

While I think using HTTP push works well, having a JS equivalent is still very valuable. With an inline script variant there is really little to no downside in regards to performance, and it would allow people to build tools that do not require configuration of any backends. Asking users to insert a snippet of JS is simple - asking them something to configure their nginx/apache/iis/django/rails/spring setup is not :)

igrigorik commented 10 years ago

@FremyCompany as you point out yourself, an HTTP 1.x server will have to inline the controller for best performance -- by definition, we need a way to do that (and no such mechanism currently exists). Your point about also allowing the controller to be delivered via server-push is a valid one. Ideally, both approaches should work.

Re, switching protocols: this is completely orthogonal. If you want to negotiate custom application protocols, take a look at ALPN extension to TLS.. but then you'd have to teach the browser how to speak your new protocol -- good luck. :)

piranna commented 10 years ago

Re, switching protocols: this is completely orthogonal. If you want to negotiate custom application protocols, take a look at ALPN extension to TLS.. but then you'd have to teach the browser how to speak your new protocol -- good luck. :)

This is just my own personal use case why I'm interested on ServiceWorkers, and I think that with Protocol Handlers ( https://developer.mozilla.org/es/docs/Web-based_protocol_handlers) this can be achieved easily...

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux

igrigorik commented 10 years ago

@piranna that has nothing to do with on-the-wire transport protocols or ServiceWorker... It allows you to register a "handler" which can be a different web service, but that will still run over regular HTTP.

piranna commented 10 years ago

Yes, but if I didn't understood bad how ServiceWorker will works according to the manifest publised on GitHub, nothing can't prevent me to define a protocol handler that points to a URL managed by a ServiceWorker, also if it's defined on a diferent domain.

El 31/10/2013 04:28, "Ilya Grigorik" notifications@github.com escribió:

@piranna that has nothing to do with on-the-wire transport protocols or ServiceWorker... It allows you to register a "handler" which can be a different web service, but that will still run over regular HTTP.

— Reply to this email directly or view it on GitHub.

FremyCompany commented 10 years ago

@jansepar This is an issue, to the contrary. That means that every time the browser encounters a < script >, it cannot issue requests from the look-ahead parser. We clearly do not want that to happen.

@igrigorik I propose a method enabling to inline the handler in the case of HTTP1 servers: they can use a "data:" uri in the header instead of using a relative url to the server. This is working totally fine and only need to be done once (because the service worker could notify he's already installed by adding a specific header with his version to further requests). All this can be done at the server level without touching the code of the page itself.

Regarding the protocols switch, I think it's not a real issue to override or embrace the HTTP semantics because we can open a WebSocket to the main server and then use it as a bidirectionnal channel. There's also the possibility to add features on top of HTTP like accepting a DIFF file and using the previous version in cache and the DIFF to generate the new version in cache. This does not require redefining the HTTP semantics, it just requires an intermediate layer between the resource loader and the network layer.

FremyCompany commented 10 years ago

What I meant with my DIFF example is that we can add features on top of HTTP, like differential update of files, while

remaining totally compatible with browsers that do not support ServiceWorkers (those will not send the required opt-in header for differential updates and will either get a 304 or a 200)
not requiring any cooperation from the hosted application to be effective (differential update of files could work on any websites running today, and could be offered by some network provider as part of his web hosting)

This is why I think Always-On SeviceWorkers should be exposed at the HTTP Header level, and not inside the web page itself.

FremyCompany commented 10 years ago

Also, http headers are far less likely to be corrupted during an XSS attack.

While the attacked websites can still reinstall a proper worker once the issue is discovered, - at least if the user refresh the app and does not always resume it from thombstoned state, - between the moment where the attack started and the attack is bloqued, many critical user information may have leaked to the attacker. ServiceWorkers are very powerful, so allowing XSS code to define them is probably dangerous.

Using an header also makes it possible for the browser to do a HEAD request from time to time to the page currently being loaded or resume from thombstoning to make sure the ServiceWorker wasn't updated or corrupted (and ask the user to refresh the page automatically if it was). A website could then issue a "Service-Worker: None" header to force all clients to terminate the service worker currently in use, for example because it has been determined it had been corrupted by an attacker.

jansepar commented 10 years ago

@FremyCompany I'm not sure I understand how this conflicts with the look-ahead parser. I think the ServiceWorker would have to work hand-in-hand with the look-ahead parser, regardless of whether or not we allow inline ServiceWorkers, because any script kicked off through the look-ahead parser would need to route through the ServiceWorker anyways. Say for example, the ServiceWorker is asynchronously installed (forgetting this ticket for a second) - subsequent page loads will still use the look-ahead parser to kick off all requests for resources in the page, and loading of those resources would have to be aware of the ServiceWorker in order to properly route the requests.

The example that @igrigorik wrote up seems like it would work in such a way that allows for installation of the ServiceWorker before the lookahead parser will have a chance to kick off any requests for resources. This assumes though that the scripts for installing the ServiceWorker are at the top of the document, declared before any other scripts, stylesheets, etc.

As for security concerns, what you're proposing suggests that the only way of setting up the ServiceWorker would have to be through HTTP Push or a data URI set in HTTP1.1 headers, correct? I personally think that this would be a large barrier to entry for adoption of this feature - as mentioned before, inserting a snippet of JS into a site is easy, asking people to configure their backends is not. Requiring scripts to be pushed from the origin using HTTP Push is inherently more safe not just for ServiceWorker, but for many other scripts as well, but requiring certain JS features to be usable only if they are pushed from the server using HTTP Push (or datauris in headers) does not seem like a good way going forward for the web. I think there are other limitations we could place on ServiceWorker to protect against XSS attacks.

FremyCompany commented 10 years ago

Preventing the look-ahead parser to issue any request while waiting for a script to execute because it could potentially install an "Always-On" ServiceWorker seem a hard case to sell to me. We need a deterministic way to decide whether or not issuing requests should go through a worker or not.

Even in the case we settle for an HTTP Header, we can let websites (that do not prevent it via CSP and do not already define the header via HTTP) use a META[HTTP-EQUIV] tag before any <script>, <link> or <style> to do exactly the same.

yoavweiss commented 10 years ago

Preventing the look-ahead parser to issue any request while waiting for a script to execute because it could potentially install an "Always-On" ServiceWorker seem a hard case to sell to me. We need a deterministic way to decide whether or not issuing requests should go through a worker or not.

The inline scripts in Ilya's example can also include some attribute that indicates to the parser "do not kick off the preloadScanner for this inline script". That means that parsing would stop for the duration of these scripts' execution, but since it should be relatively fast, the damage is significantly less then waiting for a new blocking resource to download.

Regarding the data URI in HTTP header idea, it's interesting, but I won't be surprised if some compatibility issues arise with intermediate proxies if this is deployed as HTTP rather than HTTPS. Intermediate proxies may have some expectations on the maximum length of an HTTP response value, which may break if the script is large enough. A meta tag can work better, but an inline script seems cleaner to me.

FremyCompany commented 10 years ago

@yoavweiss I've a problem with an abritrary script with just a flag. Because it can do abritrarily complex operations, it can, for exemple, trigger a synchronous XMLHttpRequest, which means the browser will have to execute the request without using a worker, and then use the worker for the following requests, which the draft explicitely said we do not want (ie: swap workers during the page load).

The other option is to make sure any such request will fail, and we will have to specify the interaction of such a worker script with other network layers. This could be very tricky and a lot of work. A purely descriptive approach avoids these pitfalls.

If we go the purely declarative way, we could enable either an <meta http-equiv> or a special kind of <script> with a specific "type" attribute that would only contain the worker's code.

But even this can be an issue, however, because this tag will have to be placed after some other meta tags (because a meta tag can cause the document's encoding to change and therefore the page should reparsed which cannot happen once a script has been sent to the worker for execution) but before any because links would trigger downloads (and the look ahead parser cannot possibly know a <script worker> is due in the next lines; once any download has been done, the worker cannot be created anymore for the current page because that would cause a worker switch we wanted to avoid).

What I don't like with the <script> solution is that it would be the first time the placement of a <script> has an impact of whether it is executed or not. Meta tags already have the notion of order dependance.

However, I'm still not convinced the inlining use case is legit. If the website owner really cares about performance, he will simply switch to an HTTP2 server. Before we have two interoperable implementations of ServiceWorker, HTTP2 is very likely to be implemented in every browser already, and major websites (which are likely candidates for service workers) will have made the switch as some already did with SPDY.

In addition, inlining a script at the top of the page as required for this will definitely push further other very important declaration like scripts and stylesheets, in addition to being impractical if your websites has multitple static pages (because you would need to modify your script in every one of them when you make a change).

At some point, we should design with best practices in mind, and the best practice here is to use an header which can be compressed using HTTP2 and rely on Server Push to "inline" the script the first time if needed.

igrigorik commented 10 years ago

IMHO, data-URI's are not the way to go here. I don't want to see large obfuscated blobs in my document, or in my headers (besides, this is a non-starter due to size constraints and overhead), for a regular (text) script. Yes, it probably works today, but it's a hack, we should be able to do better. Also, I would like to see some notion of "one-time" SW -- i.e. inlined worker is only applied in the context of the current page.

@FremyCompany maybe I'm missing something, but I really don't see what the distinction is between a SW instantiated from inline script and from external file -- it's the same sandbox and same API's once the worker is running. Also, as @jansepar already pointed out: if you have an XSS hole on your site, you can already define your own "SW"... just override the XHR prototype and go to town.

Finally, I'm as big of an HTTP 2.0 fan as there ever was.. But we shouldn't make 2.0 a requirement for SW.

FremyCompany commented 10 years ago

@igrigorik If your model is to propose that the inlined script is only running the worker for the current page, that seems reasonable to me. The issue I see is in the case you define a SW that applies on multiple pages is that if you can do it with XSS then you can also reach other pages running after that may otherwise not have been compromised. If the goal is to limite the scope of the worker to the current page, I do think an inline script may do the trick, with the before-mentioned restrictions (before any <link> but after http-equiv/charset <meta>s)

FremyCompany commented 10 years ago

@igrigorik

The difference I see is between InHeaders and InPage declarations, because the page can be subject to XSS but also to corruption of the source code (like on Github...), while the server headers are generally more robust, because they are lower-level and configured at the machine level. Good websites do not have their configuration files in their github, for instance.

That's why we have things like CSP https://developer.mozilla.org/en-US/docs/Security/CSP/Using_Content_Security_Policy) using headers.

igrigorik commented 10 years ago

@FremyCompany MITM can also inject headers... If you have an XSS hole, then what would stop you from injecting a snippet with SW declaration pointing to an external file? Am I missing something obvious?

FremyCompany commented 10 years ago

I think the issue is I wasn't able to make you understand what I meant. I'm not advocating for a difference between inline/external file, I'm advocating for a difference between inline and in headers. Man-in-the-middle is an issue you have to deal at the transport layer (via TLS) but XSS is an issue you're concerned at the application level.

Though technically, an XSS attack allows you to inject HTML you cannot possibly host a file on the server (so you would have no way to set a SW via an XSS attack if you're forced to reference a file on the current domain).

I believe it's okay to allow an inline worker to affect the current page, because it doesn't increase by much the attack surface of such an XSS attack, but if we allow this worker to be reused accross pages we potentially increase the attack surface by much.

The fact the file is hosted on the server gives the server a second chance to catch the problem and react to it. Also, as a rule of thumb, headers are more secure than the raw content.

w3c / ServiceWorker

Allow the NavigationController to manage resources on first load #73