readium / architecture

📚 Documents the architecture of the Readium projects
https://readium.org/architecture/
BSD 3-Clause "New" or "Revised" License
176 stars 33 forks source link

[Server, Streamer] HTTP header rel=prefetch Links, prioritized media-types? #97

Open danielweck opened 5 years ago

danielweck commented 5 years ago

Given the necessity to introduce limits in the number of HTTP header rel=prefetch Links (due to server/client limitations in the total number of bytes supported in HTTP headers), should CSS be prioritized over JS over fonts? (which order?)

Right now, the r2-streamer-js implementation generates prefetch Link for these HTTP Content-Types / media types:

["text/css",
"text/javascript", "application/javascript",
"application/vnd.ms-opentype", "font/otf", "application/font-sfnt",
"font/ttf", "application/font-sfnt",
"font/woff", "application/font-woff", "font/woff2"]

...however, there is no prioritization heuristic (JSON document order is used to walk the resources array property of the ReadiumWebPubManifest). Such prioritization algorithm is trivial to implement, so this is not a technical problem, just an important design consideration now that there is an artificial limit in place.

Relevant code diffs: https://github.com/readium/r2-streamer-js/compare/v1.0.10...v1.0.11 https://github.com/readium/r2-streamer-js/compare/v1.0.11...v1.0.12

Related issue: https://github.com/readium/architecture/issues/96

danielweck commented 5 years ago

Example implementation: https://github.com/readium/r2-streamer-js/pull/45/files

Priority list:

As soon as the number of HTTP prefetch links reaches the maximum ceiling limit (default is 10), the remainder of the prioritized list of prefetch-able resources is ignored. For example, if there are many CSS and JS files, fonts may not be prefetched at all.

JayPanoz commented 5 years ago

A few quick notes off the top of my head, as I’d be expecting this to primarily impact fixed-layout EPUB in the near future.

  1. so first you have vendors recommending one stylesheet per page so if say the publication is 200-page long, you’ll get 200+ stylesheets (because they also recommend a reset so theoretically, one additional file at least);
  2. then you have files with lots of fonts – typical use case seems to be PDF conversion, as PDF allows the subsetting of fonts for each text page used);
  3. worst-case scenario would be 1 + 2.
HadrienGardeur commented 5 years ago

In an ideal world, we would know for each HTML resource which CSS, JS and fonts is used. This would enable us to trigger the prefetch strictly when each HTML resource is requested.

While we could eventually achieve that with some heavy processing, we need something "less than ideal" in the meantime.

Here's my take on this:

Prefetching through HTML links won't be affected by the same limitations as the Link header and shouldn't block the browser from doing its normal job.

JayPanoz commented 5 years ago

Well, I guess CSS should be highest, given it’s render/layout-blocking, independently of the rendition (e.g. reflow/pre-paginated).

Fonts are critical for fixed-layout EPUB. Otherwise, browsers have defined they are not a long time ago (cf. font-display CSS prop). That said, you can’t necessarily swap or make them optional in EPUB reflow for example, because of fragmentation/pagination.

Scripts, I don’t have enough insights/data/anecdotes. Personally, I’ve always put them at the end of the <body> tag as a best practice but with all the authoring tools out there, my gut feeling is that in EPUB they might well be parser-blocking (<head>, and not async/defer) by “default.” Maybe that’s something Rookland (?) could talk about?

However, I guess adjusting the prioritization in this doc could be a good start: https://docs.google.com/document/d/1bCDuq9H1ih9iNjgzyAL0gpwNFiEP4TZS-YLRp_RuMlc/edit#

Especially as it can also serve as a ref longer term, for heuristics.

It seems CSS always win over fonts and JS, cf. default priorities section there: https://developers.google.com/web/fundamentals/performance/resource-prioritization

Note however there’s now priority hints because “so this script is async but it’s also important” so it kinda is a can of worms.

JayPanoz commented 5 years ago

Also

Prefetching through HTML links won't be affected by the same limitations as the Link header

I guess it means the server should sent an "Accept-Ranges: bytes" response header in case the user clicks on a link while something is being prefetched?

stadskle commented 4 years ago

We found a quirk with the r2 streamer + AWS load balancer today that is relevant to this.

We are running the streamer behind an AWS ALB to create manifest.json on ingestion. That works fine, but we have a tiny fraction of ebooks failing with the not so informative HTTP 502 code. What we discovered is that some books with many CSS files caused the pre fetch headers to pass AWS ALB hard limitation on header size (https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html). And then all it does is return a 502 with no real message on why this happened.

For anyone needing this to run safely on AWS, I guess some kind of size limit configuration would be useful.

(In our use case pre-fetching has no value, so we will just disable it. But if anyone else are using the streamer on AWS it is something to be aware of.)

danielweck commented 4 years ago

Thank you @stadskle very useful feedback :)

danielweck commented 4 years ago

Note that since version 1.0.12, r2-streamer-js supports a configurable number of prefetch HTTP header links, with a default of 10. https://github.com/readium/r2-streamer-js/blob/develop/CHANGELOG.md