Fix pageload performance via preloading browser extension

paulirish commented 8 years ago

Been thinking a lot about pageload performance recently. The big issues are:

Dependent network requests (scripts triggering scripts, css triggering webfonts)
When all these requests start, and initiating them earlier

As an example here's the render-blocking requests sitting before theverge.com's first meaningful paint: http://i.imgur.com/EvnYsIR.png About 50% of the total time is attributable to requests whose initiator is not the HTML.

Ideally the browser initiates all the network requests it needs very early, instead of waiting for each request to complete, parse, and run before then identifying new requests to start.

Based on my research, the impact of "left-aligning" all requests would have significant performance advantages. For example, when the request for index.html goes out, we'd not only get to start requesting use.typekit.com/kit/zo3ms59.js immediately but also the use.typekit.com/kit/zo3ms59.css that would be requested after executing that JS. This would make the time to firstMeaningfulPaint ~200% faster.

I realized we can actually do this behavior in a browser extension.

Here's what I've been thinking…

Collect request detail

For each document pageload, listen to the network requests made and capture their url and order. (Likely with onCompleted from the webrequest api.)
Store that data as resource preload manifest for that url
Initiate preloading requests
On request of a main document (at onSendHeaders stage), look in storage for matching preload manifests.
Add <link rel=preload> elements to trigger early preloading of each request. Specify the [as] attribute to inform resource prioritization.

Lastly, I'm hoping for this effort can help inform a native browser implementation. There's a lot of unknowns about this approach, but having some real world data would illuminate the sorts of wins and challenges available within.

Anyone interested?

cc @yoavweiss @igrigorik to correct me if i'm crazy about abusing preload for this.

patrickkettner commented 8 years ago

I'm in

alexpaluzzi commented 8 years ago

This sounds amazing. I'd love to help.

So to clarify, you're thinking Chrome Extension for this (not App), correct?

paulirish commented 8 years ago

@alexpaluzzi extension seems right. UI would maybe include settings and stats. But we will mostly rely on extension APIs.

patrickkettner commented 8 years ago

@paulirish what settings do you imagine having?

alexpaluzzi commented 8 years ago

@paulirish I agree. Let's do it. I'm all for experimenting.

ahmadnassri commented 8 years ago

fantastic, would certainly be interested in seeing the outcome of the experiment, wish I had more time to spare to help out!

jeremenichelli commented 8 years ago

I would definitely love this. Is there a chance to also create a JS lib based on this? I can contribute if that's the case.

igrigorik commented 8 years ago

cc @yoavweiss @igrigorik to correct me if i'm crazy about abusing preload for this.

Not crazy and I'd love to see the results of such experiment.

However, there is one small gotcha: <link rel=preload> is not yet enabled in Chrome. The implementation is underway but we're blocked on as, which in turn is blocked on some Fetch+Preload spec plumbing.. resolving that is P0 on my list.

In the meantime, I think you can still get a lot of the mileage by scoping down your prefetches to JS+CSS and inject corresponding <script> and <link rel=stylesheet> elements.

yoavweiss commented 8 years ago

If we're just talking about an experiment here, you could enable the "experimental Web features" flag and try it out with <link rel=preload>. as is implemented as well (with the possible values of "image", "script" and "stylesheet". I could add more values to help out such experiment).

What I'd be worried about in that scenario is that Web pages tend to change over time (or all the time, depending). For dynamic pages that actually change, you may get low hit rates, and using preload (a MUST fetch) for speculative fetches would mean you may have significant perf degradation in these cases.

javiertejero commented 8 years ago

+1

kdzwinel commented 8 years ago

@igrigorik we can't really use <script> because it will execute the code. This will cause all "prefetched" scripts to be called twice and, potentially, in the wrong order. We can use <object> though, it will preload the script without running it (more here).

Maybe I'm missing something, but I'm rather sceptical about this idea. What we are trying to do is outsmart the browser cache and IMO this won't work.

Lets take this minimal scenario: we have a page with a script that loads a stylesheet. Our extension analyses that page on the first load and learns about both resources (so does browser cache). On the second load, extension injects <link rel=preload>, for both script and the stylesheet, into the <head> of the said website. Browser sees these <link rel=preload>s and says "hey, I have these cached!", so no action is taken. Then it sees original <scirpt>, loads it from cache, executes, loads stylesheet from cache and renders the page.

So, in case where all resources are already in the browser cache, our extension will not improve performance. It will potentially improve performance only if these scripts, stylesheets and images are not cached by the browser. However, this will only happen if these resources:

were not cached in the first place (e.g. due to no-cache headers),
expired (again - headers),
browser cache was cleared by the user or
by the browser itself (e.g. low disk space).

Our extension will just ignore these cases and try to prefetch these resources anyway. For the first two cases (no-cache, expire) extension can actually harm the performance by loading irrelevant resources (as @yoavweiss already mentioned). For the third one (user clears the cache) we are storing things that user wanted to get rid of - potential privacy issue.

patrickkettner commented 8 years ago

I created a super bare bones version of this

I hit a few snags along the way

onsendheader is sent on every request, not just the initial load. as a result I load the <link>s in a content script before_start
there is no way of knowing (far as I can tell) where a resource is being requested from. as a result, you can't tell if an img is content on the page or being requests via the css. This means if you configure this plugin to track images, then on a site like imgur or facebook you'll preload a shitton of photos anytime the page loads. Perhaps the chrome extension APIs could expose the requester the way the devtools do
webfonts are not currenly supported as a valid as type
its not clear how one should determine what assets go for what site. github.io pages all use different subdomains, sites like jsbin, codepen, etc all use different paths. Caching based on the pathname would defeat the purpose for most websites, so I am currently going off of the hostname

patrickkettner commented 8 years ago

@kdzwinel

... extension can actually harm the performance by loading irrelevant resources...

isn't that why we are experimenting?

For the third one (user clears the cache) we are storing things that user wanted to get rid of - potential privacy issue

Unless someone was trying to be malicious with the extension, you could watch for the cache being dumped (not aware of an event, so probably would need to poll again the browserData until an event is added) and clear the storage data. Or, all of the data could be stored in user storage so it is removed when the browser clears their cache.

kdzwinel commented 8 years ago

Sure, lets experiment!

I created a simple page that, at the very end of the <body>, loads a script which loads a stylesheet, which loads an image. Then, I made a copy of that page and added <link rel=preload> for each resource in the <head> (simulating the extension). After that, in the latest canary ( 46.0.2488.0 ), I enabled "experimental Web Platform features" (chrome://flags/#enable-experimental-web-platform-features), turned on network throttling ("Good 2G") and tested:

NO PRELOAD, EMPTY CACHE

NO PRELOAD, RESOURCES CACHED

PRELOAD, EMPTY CACHE

PRELOAD, RESOURCES CACHED

This is a rather confusing result, it looks like preloading harms performance (by loading things twice), but maybe my setup is invalid?

roblarsen commented 8 years ago

You did the test one run each? You've got to run tests like this a lot to shake out the weirdness of individual sessions.

Is there a tool like Hammerhead for Chrome (actually, is there a tool like Hammerhead for Firefox, since Hammerhead itself no longer works?) There are other ways to automate, but when I did a lot of this stuff I liked to be able to just set the test to run 100x and walk away all right in the browser.

kdzwinel commented 8 years ago

In this case it's the waterfall and page size that are interresting, not the load times (that, as you mentioned, can fluctuate). I think yhat you can use phantomas to make multiple tests and figure out median results, not sure if there is a canary based version though.

roblarsen commented 8 years ago

I guess I'm confused at the purpose of the whole exercise then. If we're not looking at load times (which is the only thing a user cares about) then why bother with any of this? In this case the waterfall and page size are an interesting diagnostic for a strange result (13 second load time!)

yoavweiss commented 8 years ago

@kdzwinel - That's bad and needs fixing :( Can you file a crbug.com and send it my way?

kdzwinel commented 8 years ago

"Preloaded" version loads resorces twice for some reason, I'll never beat a setup where resources are loaded only once. So, unit we sort it out, load time is not really important.

roblarsen commented 8 years ago

Sure, and you've identified a bug. But the idea that "load time is not really important" to stare at waterfalls is like being a doctor and saying "whether the patient recovers or not is not really important" because there's an interesting MRI to look at.

Garbee commented 8 years ago

I don't think the experiment was about the load time at all, but purely about the duplicate inclusion of resources that was being caused. Or am I missing something in the setup of the test?

kdzwinel commented 8 years ago

Yeah, sorry if I caused confusion. Load time is what will really matter at the end. I wanted to get the setup right before getting to the actual testing. Duplicated calls puzzled me, but since it's a chrome bug this discussion is now an offtopic.

@yoavweiss I'll submit a bug report in ~2h.

[EDIT] bug report

patrickkettner commented 8 years ago

fix for this just landed in chromium, I will retry the extension tomorrow once it is in canary

paulirish commented 8 years ago

fix for this just landed in chromium, I will retry the extension tomorrow once it is in canary

FWIW: https://download-chromium.appspot.com/ has latest builds, usually an hour or two after a commit lands. In case you're impatient. ;)

paulirish commented 8 years ago

Maybe I'm missing something, but I'm rather skeptical about this idea. What we are trying to do is outsmart the browser cache and IMO this won't work.

Allow me to attempt to defend the idea, @kdzwinel. :)

The browser cache, IMO, really doesn't add a lot to critical performance issues. You don't get a second chance to make a first impression. And the browser cache, by definition, doesn't contribute to a first impression.

What this is really about is initiating requests for render-blocking resources faster much earlier. If 100% of render-blocking resources were discovered by the preload scanner, we totally wouldn't need this. But they aren't. Render-blocking resources (stylesheets, fonts (kinda), and scripts), are initiated by other scripts and stylesheets allllll the time. And we're waiting on network latency to discover these very important items.

Right now, the design of the idea collects request data on pages you visit. But that's incomplete and less effective. The larger vision is to preload assets for pages you've never been to. This requires a more centralize endpoint to supply the request manifests for any given URL/domain. Even then, it won't be perfect and will require some work. But I am confident that the benefits of speculative preloading will be worth it.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it has not had recent activity. Thank you for your contributions.

h5bp / lazyweb-requests