w3c / dpub-pwp-ucr

Use Cases and Requirements for (Packaged) Web Publications
https://w3c.github.io/dpub-pwp-ucr/
Other
11 stars 19 forks source link

3.3 Constituent Resources #110

Closed marcoscaceres closed 7 years ago

marcoscaceres commented 8 years ago

Req. 16: The information regarding the constituent resources of a PWP must be easily discovered

The case for this requirement is extremely weak.

In particular, no clear justification is given for:

It is therefore necessary for the reading system to have an easy access to the list of constituent resources, and some of their characteristics like their media types or sizes.

It just gives some hand-wavy non-justification in the sentence before, about things being "prohibitive".

It almost feels like this requirement has been snuck in to justify the inclusion of some manifest file format for the packaging format (which seems pretty clear is trying to underpin/undermine? the requirements of this document).

iherman commented 7 years ago

@HadrienGardeur, that is great.

Of course, you are right, the bulk of the JS application can be anywhere (there may be CORS issues, but that can be handled), it is only the service worker setup part that is relevant. I should have thought of that. And yes, it is also true that JS can be shared among WP-s on the same domain (eg, by a publisher), the examples I referred to make that also clear. Have you checked what data the event in the install handler receives? It may be that one of the entries there is the URL of a specific book.

However, the question then remains: is it acceptable for publishers, authors, etc, to be required to deliver a (probably standard) small JS script to register and initialize the service worker? I cannot really judge, I am not a publisher, but many of you guys on the list may have an answer to that...

BillKasdorf commented 7 years ago

Re: is it acceptable for publishers, authors, etc, to be required to deliver a (probably standard) small JS script to register and initialize the service worker?

That seems reasonable to me. The vast majority of professional publishers' files are actually produced by vendors, virtually all of whom should have the ability to do this. Of course we cannot make it impossible for small or self-publishers to do this; but that is an implementation, "techniques" or "best practices" issue, to show them what they need to do. For the vast majority of professional publications, this does not strike me as an obstacle at all.

Bill Kasdorf

VP and Principal Consultant | Apex CoVantage

p:

734-904-6252 m: 734-904-6252

ISNI: http://isni.org/isni/0000000116490786 ORCiD: https://orcid.org/0000-0001-7002-4786https://orcid.org/0000-0001-7002-4786?lang=en

From: Ivan Herman [mailto:notifications@github.com] Sent: Wednesday, October 05, 2016 10:54 AM To: w3c/dpub-pwp-ucr Cc: Bill Kasdorf; Comment Subject: Re: [w3c/dpub-pwp-ucr] 3.3 Constituent Resources (#110)

@HadrienGardeurhttps://github.com/HadrienGardeur, that is great.

Of course, you are right, the bulk of the JS application can be anywhere (there may be CORS issues, but that can be handled), it is only the service worker setup part that is relevant. I should have thought of that. And yes, it is also true that JS can be shared among WP-s on the same domain (eg, by a publisher), the examples I referred to make that also clear. Have you checked what data the event in the install handler receives? It may be that one of the entries there is the URL of a specific book.

However, the question then remains: is it acceptable for publishers, authors, etc, to be required to deliver a (probably standard) small JS script to register and initialize the service worker? I cannot really judge, I am not a publisher, but many of you guys on the list may have an answer to that...

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/w3c/dpub-pwp-ucr/issues/110#issuecomment-251698605, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIYxNaaOWaeK7Pkc3CikCNKg0Pr3_K_Bks5qw7n2gaJpZM4J8XbJ.

iherman commented 7 years ago

@BillKasdorf great. It may be some sort of a standard, small piece of JS that we define and distribute...

But I still hope we will not need it...

lrosenthol commented 7 years ago

On Wed, Oct 5, 2016 at 9:34 AM, Ivan Herman notifications@github.com wrote:

I have tried to go a little bit deeper into how Service Workers work and how they could be used/adapted to publications needs, too. I have therefore looked at the latest spec https://w3c.github.io/ServiceWorker/ and two "tutorial"-like pages on MDN https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers and on Google https://developers.google.com/web/fundamentals/getting-started/primers/service-workers.

Please remove this.

You can't do anything with SW and PWP, until we better understand the requirements for PWP. This is the heated debate that I've been having with you, and others, about SW for months now. Unlike you and your "guesses" - I have a team that has actually been doing very detailed implementations around SW for months now. We have been running into blockers - at least based on our goals for a PWP. I suspect that these are not unique.

I think that the requirements of Dave https://github.com/w3c/dpub-pwp-ucr/issues/110#issuecomment-249888579, and the more explicit descriptions of Hadrien https://github.com/w3c/dpub-pwp-ucr/issues/110#issuecomment-250757809, lead to a model whereby:

Both of which have NEVER been discussed amongst the group to decide if we (a) agree on all of them and (b) have other things to contribute.

@iherman - I appreciate your desire to start jumping into the deep end of this work, but we are still far away from that. And you are only causing more confusion due to your lack of depth on the topics.

The questionis is whether that shared JS file must be physically part of the Web Publication (ie, essentially, replicated for each and every Web Publication ever produced), or whether the JS file is not necessarily on the Web site of the Web Publication itself (ie, could be shared among WP-s if it is some sort of a standard).

You need to support both models, as we have use cases for both.

Also, because we wish to support both, this has implications on a variety of implementation decisions (including SW stuff!). Thus proving my earlier point.

lrosenthol commented 7 years ago

On Wed, Oct 5, 2016 at 10:53 AM, Ivan Herman notifications@github.com wrote:

Of course, you are right, the bulk of the JS application can be anywhere (there may be CORS issues, but that can be handled),

Actually, the CORS issues are one of the bigger problems that we face here...It's part of our issues around the security model.

And yes, it is also true that JS can be shared among WP-s on the same domain (eg, by a publisher), the examples I referred to make that also clear. However, the question then remains: is it acceptable for publishers, authors, etc, to be required to deliver a (probably standard) small JS script to register and initialize the service worker? I cannot really judge, I am not a publisher, but many of you guys on the list may have an answer to that...

Stop thinking about professional publishers - and think anyone wanting to publish content. Think documents like memos and reports. Think about content sharing platforms such as GDocs, DropBox, ScribD, etc. How does this work relate to all of them. You need a solution that could potentially address them all.

TzviyaSiegman commented 7 years ago

Gentle public reminder that this is a professional work environment. Please comment on the issues not the scope of knowledge. We even have consensus on this https://www.w3.org/Consortium/cepc/!

marcoscaceres commented 7 years ago

On 6 Oct. 2016, at 1:24 am, Mike - 5DOC notifications@github.com wrote:

The cache lives for 10 minutes and then puff? There's no UI to know if a set of related documents has actually been cached? What if I wanted to take @iherman last presentation including video on the plane for a flight from Europe to SF? How would I know the 10mb presentation was downloaded and if the cache settings were for a long enough duration?

The cache API is persistent.

What if I wanted to annotate the presentation on the plane? If the cache expires before I get to my hotel do I lose the presentation and the annotations?

They get automatically pushed to the server via background sync.

Or should I print each page to PDF an annotate it in Preview?

If you prefer - but no need :)

Given that the cache duration and therefore the actual offline persistence is defined by the publisher, I don't understand the point of an ephemeral offline standard except as quasi-DRM.

We can prevent people misusing it - but sure: a publisher could time bomb the cache (via the API). But maybe you borrowed the publication for 24 hours? The prerogative is with the publisher in negation with the user.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

sideshowbarker commented 7 years ago

@lrosenthol wrote:

I have a team that has actually been doing very detailed implementations around SW for months now. We have been running into blockers

Exactly what kinds of “blockers”? Please describe the nature of the problems in more detail here or somewhere else and then post a link to the details here.

lrosenthol commented 7 years ago

@sideshowbarker Mike, I mentioned some of these in other threads, but happy to repeat them here if it's helpful.

  1. The requirement that everything to be served via https is too restrictive for use in web publications (WPs). Service workers can only be instantiated on a “secure context” and since that’s transitive it means: The top-level page, and the whole hierarchy of iframes up till the one holding the content and instantiating the worker must be over https. We strongly believe this is an unreasonable constraint to impose on websites hosting arbitrary publications.
  2. Some browsers do not support SWs in certain conditions. For example FF does not support SW in private browsing mode. It's not clear to us if this is a design/spec issue, as the spec isn't clear on the topic, or just a current product limitation. It would be good to get @marcoscaceres, @jakearchibald and others to weigh in.
  3. SW do not work inside sandboxed Iframes. This too is not spelled out in the SW spec though its referenced here by jake. The ability to use sandboxed Iframes is, IMO, important to provide a strong security model for WPs.
sideshowbarker commented 7 years ago

@lrosenthol wrote:

On Wed, Oct 5, 2016 at 9:34 AM, Ivan Herman notifications@github.com wrote:

I have tried to go a little bit deeper into how Service Workers work and how they could be used/adapted to publications needs, too. I have therefore looked at the latest spec https://w3c.github.io/ServiceWorker/ and two "tutorial"-like pages on MDN https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers and on Google https://developers.google.com/web/fundamentals/getting-started/primers/service-workers.

Please remove this.

What exactly are you suggesting be removed and from where?

I think that the requirements of Dave https://github.com/w3c/dpub-pwp-ucr/issues/110#issuecomment-249888579, and the more explicit descriptions of Hadrien https://github.com/w3c/dpub-pwp-ucr/issues/110#issuecomment-250757809, lead to a model whereby:

Both of which have NEVER been discussed amongst the group to decide if we (a) agree on all of them and (b) have other things to contribute.

Isn’t that discussion what’s happening right here now? Are you suggesting it should stop? Or what?

@iherman - I appreciate your desire to start jumping into the deep end of this work, but we are still far away from that. And you are only causing more confusion due to your lack of depth on the topics.

I disagree with both of those statements. What happened here is that somebody took time to study and get a better understanding of an important standard feature of the platform that clearly has relevance to the discussion, and then came back here to share their understanding, and then others here responded in substance. I think most people would judge that all to be genuinely productive.

marcoscaceres commented 7 years ago

@lrosenthol:

You can't do anything with SW and PWP, until we better understand the requirements for PWP. This is the heated debate that I've been having with you, and others, about SW for months now. Unlike you and your "guesses" - I have a team that has actually been doing very detailed implementations around SW for months now. We have been running into blockers - at least based on our goals for a PWP. I suspect that these are not unique.

If there are issues, please raise them properly so we can evaluate them. You can't come in a just say, "we had issues" and imply the argument is invalid. Otherwise, I can easily question the competence of your team - and I don't want to do that.

marcoscaceres commented 7 years ago
  1. The requirement that everything to be served via https is too restrictive for use in web publications (WPs). Service workers can only be instantiated on a “secure context” and since that’s transitive it means: The top-level page, and the whole hierarchy of iframes up till the one holding the content and instantiating the worker must be over https. We strongly believe this is an unreasonable constraint to impose on websites hosting arbitrary publications.

Snowden proved that this is a must. Not doing features over TLS is a non starter. Browser vendors are working hard to deprecate HTTP.

  1. Some browsers do not support SWs in certain conditions. For example FF does not support SW in private browsing mode. It's not clear to us if this is a design/spec issue, as the spec isn't clear on the topic, or just a current product limitation. It would be good to get @marcoscaceres, @jakearchibald and others to weigh in.

It doesn't make sense to keep around an SW in private browser mode. The whole point of private browsing mode is that it all information gets destroyed when you are done "private browsing".

  1. SW do not work inside sandboxed Iframes. This too is not spelled out in the SW spec though its referenced here by jake. The ability to use sandboxed Iframes is, IMO, important to provide a strong security model for WPs.

Depends on the use case. What are you trying to do here?

marcoscaceres commented 7 years ago

@HadrienGardeur

Based on my experience, the Web App Manifest should only appear on the "homepage" of the publication, otherwise you end up with the install banner popping up everywhere, including in the middle of the publication after an install.

You can suppress the install prompt on all pages other than the one you want via "beforeinstallprompt":

window.addEventListener("beforeinstallprompt", (ev) => {
   if(window.location.pathname !== "/index.html"){
      ev.preventDefault();
   }
});

Note: "beforeinstallprompt" is currently non standard - but we are working to standardizing something similar. https://github.com/w3c/manifest/issues/417

marcoscaceres commented 7 years ago

Argh... making typos all over the place today... need coffee.

lrosenthol commented 7 years ago

On Wed, Oct 5, 2016 at 8:22 PM, Marcos Cáceres notifications@github.com wrote:

  1. The requirement that everything to be served via https is too restrictive for use in web publications (WPs).

Snowden proved that this is a must. Not doing features over TLS is a non starter. Browser vendors are working hard to deprecate HTTP.

I understand that, however there are times/situations where one cannot do TLS - yet we would like publications to continue to work. One of the key cases (though not the only one) is for packaged publications, as there is (as we have discussed) no way to serve them in a TLS-compatible manner. (NOTE: it is possible to serve them in a secure context, which would suffice for SWs, but the specs and implementations don't match that)

1.

It doesn't make sense to keep around an SW in private browser mode. The whole point of private browsing mode is that it all information gets destroyed when you are done "private browsing".

True. But while I have the private browsing mode active, I should be able to take content offline via SWs. So that the private browsing session that I started at the airport is still usable while on the plane.

  1. SW do not work inside sandboxed Iframes. This too is not spelled out in the SW spec though its referenced here by jake. The ability to use sandboxed Iframes is, IMO, important to provide a strong security model for WPs.

Depends on the use case. What are you trying to do here?

What depends on the use case? Sandboxed iframes allow, as you know, a safe way to host content that may contain scripts (not written by the author or from the host domain) inside of another site/domain. When doing this, there is (as with the previous example) no reason why my online content cant' be taken offline.

marcoscaceres commented 7 years ago

I understand that, however there are times/situations where one cannot do TLS - yet we would like publications to continue to work. One of the key cases (though not the only one) is for packaged publications, as there is (as we have discussed) no way to serve them in a TLS-compatible manner. (NOTE: it is possible to serve them in a secure context, which would suffice for SWs, but the specs and implementations don't match that)

It might be worth investigating if aspects of web packaging could address the security requirements: https://github.com/w3ctag/packaging-on-the-web

True. But while I have the private browsing mode active, I should be able to take content offline via SWs. So that the private browsing session that I started at the airport is still usable while on the plane.

Yes, this seems like a fair request - though a bit of an edge case.

What depends on the use case? Sandboxed iframes allow, as you know, a safe way to host content that may contain scripts (not written by the author or from the host domain) inside of another site/domain. When doing this, there is (as with the previous example) no reason why my online content cant' be taken offline.

Using "foreign fetch" should address this use case: https://w3c.github.io/ServiceWorker/#on-foreign-fetch-request-algorithm

It would allow you to request the iframe, secure it, and have the other origin serve the content even while offline.

sideshowbarker commented 7 years ago
  1. The requirement that everything to be served via https is too restrictive for use in web publications (WPs).

Snowden proved that this is a must. Not doing features over TLS is a non starter. Browser vendors are working hard to deprecate HTTP.

I understand that, however there are times/situations where one cannot do TLS - yet we would like publications to continue to work. One of the key cases (though not the only one) is for packaged publications, as there is (as we have discussed) no way to serve them in a TLS-compatible manner.

In a case where you can’t serve something over TLS then you’re not only not going to be able to use SW specifically, you’re also not going to be able to use anything that provides online/offline persistence in any way similar to what SW does. Even the (deprecated) HTML5 Application Cache feature now requires documents and their assets to be served over TLS.

Any mechanism that does what’s needed here but that does not use TLS is escalating the risk and damage of an XSS attack, and browser-engine projects are not going to implement it.

One of the key cases (though not the only one) is for packaged publications, as there is (as we have discussed) no way to serve them in a TLS-compatible manner. (NOTE: it is possible > to serve them in a secure context, which would suffice for SWs, but the specs and implementations don't match that)

I’m not clear on what you mean. When I first read the above I thought it sounded like you are saying there is a way to serve documents in a secure context without using TLS. But that’s not what you mean is it?

Can you explain what you mean in terms of any specific references from the Secure Contexts spec?

Where you say, “the specs and implementations don't match that”, what specifically is “that” and in what way do specs and implementations not match it?

marcoscaceres commented 7 years ago

(there are some easier to digest details about foreign fetch here: https://developers.google.com/web/updates/2016/09/foreign-fetch?hl=en)

marcoscaceres commented 7 years ago

@sideshowbarker @lrosenthol, the main requirement around packaging should really be:

  1. can the user agent cryptographically verify that the package has not been tampered with in flight?
  2. can the user agent cryptographically verify that the package came from origin x?
sideshowbarker commented 7 years ago

(there are some easier to digest details about foreign fetch here: https://developers.google.com/web/updates/2016/09/foreign-fetch?hl=en)

Also worth mentioning here is https://github.com/w3c/ServiceWorker/blob/master/foreign_fetch_explainer.md which is even easier to digest.

sideshowbarker commented 7 years ago

the main requirement around packaging should really be:

  1. can the user agent cryptographically verify that the package has not been tampered with in flight?
  2. can the user agent cryptographically verify that the package came from origin x?

Agreed (and that’s essentially what “secure context” is defined to mean, but I don’t see how in practice without TLS you could achieve those two things in the context of the Web runtime).

marcoscaceres commented 7 years ago

@sideshowbarker thanks for the addition link! That's great because it covers the CORS case nicely. Also, agree about needing clarification re: "secure context".

iherman commented 7 years ago

@sideshowbarker https://github.com/sideshowbarker @lrosenthol https://github.com/lrosenthol, the main requirement around packaging should really be:

can the user agent cryptographically verify that the package has not been tampered with in flight? can the user agent cryptographically verify that the package came from origin x?

My understanding of @lrosenthol's use case is that if I have already verified that a package has not been tampered with while being genuinely online, I would like to continue using it. In other words, I do not think there is a requirement whereby I have to check the package while I offline for a document that I have fetched while online.

(I am not an real expert in crypto, but I would expect that hashing each constituents in the package, and the hashing the collection of all those hashes may lead to a hash value that I could get sign by the publisher, or store the hash with the identifier of the publication in a secure place like a blockchain, and this would make it possible to check the first item while online…)

mac2net commented 7 years ago

To summarise:

  1. SW is a cache and notification technology that at this point doesn't have cross browser support. Mobile?
  2. The publisher controls the availability of page(s) via a setting in the manifest for cache duration.
  3. There is no UI right now to tell the user the duration of the cache and what if anything is available offline. The Moby Dick example takes up over 1 MB of space most of this apparently for fonts. And while access to the SW may expire, it appears the actual files stick around indefinitely.

Here's my question: why can't an updated file URL work the same way - that is when opened via this new scheme place all the components into a cache and serve content from there? But instead of a confusing UI showing the web source on the internet in he URL, the URL would just show the title of the document in the URL. And instead of the document expiring, the user will have it stored on a device as a plain old file that everyone knows and loves...

Also, please let's see a real world example of this. How about the Ben Franklin autobio on Gutenberg (I already made it into a 5DOC)? It's 2.5 MB with images & CSS.

Let's be honest: publishers like EPUB because it can have DRM or be served online via Readium and therefore they don't lose control of the content - hence their embrace of PWP->WP with service workers and a publisher configurable cache.

But given that EPUB has been nowhere near as successful as PDF for consuming and a complete failure when compared to PDF with regard to ease of producing, it would seem that a vendor driven WP is an overly narrow solution that only satisfies the wants of a very select community.

lrosenthol commented 7 years ago

@sideshowbarker https://github.com/sideshowbarker & @marcoscaceres https://github.com/marcoscaceres - thanks for the pointers on Foreign Fetch - will check them out!

On Wed, Oct 5, 2016 at 9:10 PM, Michael[tm] Smith notifications@github.com wrote:

1.

In a case where you can’t serve something over TLS then you’re not only not going to be able to use SW specifically, you’re also not going to be able to use anything that provides online/offline persistence in any way similar to what SW does. Even the (deprecated) HTML5 Application Cache feature now requires documents and their assets to be served over TLS.

Under the current specs - I agree. And that introduces a problem for Packaged WP (as discussed before) since we can't use TLS to serve them locally (either via a real server or a simulated one).

It's also a potential problem for regular WP's while there remains non-TLS-based sites in the world, as there may be a need to host a secure WP on a non-secure site (or vice-versa) and you can't mix and match.

Any mechanism that does what’s needed here but that does not use TLS is

escalating the risk and damage of an XSS attack, and browser-engine projects are not going to implement it.

Then we may be at an impasse, as I think it is at least worthy of discussion with those browser engine projects. We may well lose the argument - but we should at least have the discussion.

One of the key cases (though not the only one) is for packaged publications, as there is (as we have discussed) no way to serve them in a TLS-compatible manner. (NOTE: it is possible > to serve them in a secure context, which would suffice for SWs, but the specs and implementations don't match that)

I’m not clear on what you mean. When I first read the above I thought it sounded like you are saying there is a way to serve documents in a secure context without using TLS. But that’s not what you mean is it?

Can you explain what you mean in terms of any specific references from the Secure Contexts spec https://w3c.github.io/webappsec-secure-contexts/?

When a browser engine is used by a non-browser (eg. a book reading system) UA, that UA can instruct the browser engine to load content into either a secure or non-secure context - "overriding" the normal context choice that the browser engine would normally make based on origin. This way, content that might not normally be considered to be running in a secure context can be made to do so.

Where you say, “the specs and implementations don't match that”, what

specifically is “that” and in what way do specs and implementations not match it?

However, that secure context (at least in the case of some of the browser engines) isn't a "full" secure context - as it would be when the browser engine is loading content from a TLS-secured site. I am not sure if this is a lack of specification or simply inconsistent implementation. But, again, it's a topic that I'd like to raise with the specification and implementation folks.

lrosenthol commented 7 years ago

On Thu, Oct 6, 2016 at 12:49 AM, Ivan Herman notifications@github.com wrote:

@sideshowbarker https://github.com/sideshowbarker @lrosenthol < https://github.com/lrosenthol>, the main requirement around packaging should really be:

can the user agent cryptographically verify that the package has not been tampered with in flight? can the user agent cryptographically verify that the package came from origin x?

Those are certainly good requirements though I don't agree they are the only ones or that I would phrase them in that way. But when we get to that discussion, it's an excellent starting point.

My understanding of @lrosenthol's use case is that if I have already verified that a package has not been tampered with while being genuinely online, I would like to continue using it. In other words, I do not think there is a requirement whereby I have to check the package while I offline for a document that I have fetched while online.

Agreed. Let's assume for the sake of this discussion, that the package was verified based on teh requirements above while still online and now all I want to do is use it offline. (so not really any different than the SW case, except for how the stuff is stored and possibly served by the UA)

iherman commented 7 years ago

@marcoscaceres, you said:

It might be worth investigating if aspects of web packaging could address the security requirements: https://github.com/w3ctag/packaging-on-the-web

Is that project still alive? The impression we got is that the Web packaging approach is essentially dead (this IG did look at the document early on).

marcoscaceres commented 7 years ago

Is that project still alive? The impression we got is that the Web packaging approach is essentially dead (this IG did look at the document early on).

Last I spoke to the TAG there was still interest... if it's not alive, it's yet another dead "packaging" effort.

dauwhe commented 7 years ago

I heard there was some interest from Google in packaging web sites for various reasons (sending to CDNs, setting up local copies in parts of the world with limited infrastructure/bandwidth)... but I don't know where this is being worked on, or by whom...

danielweck commented 7 years ago

@iherman said: 'I know that Readium has experimented with Service Workers on top of Chrome, albeit in a separate Chrome "application". @danielweck, could you share your experience on this?'

To avoid confusion, let me clarify something: there's a Readium Chrome "extension" (actually a "packaged app"), which of course is Chrome-specific. Then there's the Readium "cloud reader" which is a pure client-side web app that runs in any modern browser.

The Chrome app and the web app share a very large amount of code, but the Service Worker experiment was only implemented in the context of the web app. The development goal was not to enable a seamless online / offline reading experience, but to explore an alternative mechanism to how Readium currently fetches and renders individual ebook resources (HTML, CSS, images, etc.) directly from a zipped EPUB archive stored on a content server.

Our experimental use of Service Worker allows a single HTTP URL (pointing to the packed EPUB) to be "exploded" into multiple URLs that resolve to distinct file streams from within the EPUB container. Instead of having to convert HTTP partial byte-range payloads into binary URI Blobs (which are statically pre-fetched to populate the entire DOM, CSS stylesheets, etc.), we instantiate a Service Worker that dynamically intercepts "virtual" single resource requests, then translates them into corresponding "physical" HTTP byte-range requests (i.e. file entries inside the zip archive), and finally creates HTTP responses that the web browser can consume in a totally transparent manner (i.e. the webview has no "awareness" that the requested HTTP URLs actually have no physical mapping to the content server).

This experiment could certainly be extended to support caching of individual EPUB resources already fetched via HTTP "partial byte range" requests. This would definitely improve performance, and would also be a step towards supporting the online / offline reading experience.

In closing, I just want to say that the functional equivalent of Service Workers has been available for a long time in native / hybrid apps (i.e. native UI + embedded webview control). So, although SW offers unprecedented implementation opportunities in web browser apps, I hope that the next generation of digital publications for the web (e.g. PWP / EPUB4 format) does not conceptually depend on Service Worker technologies. Sometimes during our discussions there is a blurry line between the notion of a declarative publication format (including all the meta-structures connecting the various bits together, like the concept of "manifest", high-level navigation, metadata, etc.), and the imperative / programmatic scripts that implement reading system behaviour (polyfill), including online/offline, caching, etc.

iherman commented 7 years ago

Thanks a lot @danielweck. If my understanding is correct, that experiment is, therefore, slightly different than what we were discussing, insofar as the discussions we had concentrated on the case where the WP is on the Web and not yet for the case when it is in a package.

Your point on the dependency issue is well taken. There is a fine line which we indeed cross from both sides, but I think it is all right for now to do that at this point to clarify the issues and the possible problems. This will help us, I believe, to set the right line when it comes to genuine specification work.

lrosenthol commented 7 years ago

On Fri, Oct 7, 2016 at 7:05 AM, Daniel Weck notifications@github.com wrote:

So, although SW offers unprecedented implementation opportunities in web browser apps, I hope that the next generation of digital publications for the web (e.g. PWP / EPUB4 format) does not conceptually depend on Service Worker technologies.

Agreed completely, @danielweck https://github.com/danielweck. SW's are a very good technology and can serve some common use cases - but they aren't the solution to all things and we need to remember that.

HadrienGardeur commented 7 years ago

@danielweck that's interesting, it sounds almost like a middleware between a packaged EPUB and what we would need for an exploded publication with a manifest.

Could you point us to the code for it? Thanks!

danielweck commented 7 years ago

@HadrienGardeur https://github.com/readium/readium-js/blob/feature/ServiceWorker/js/EpubServiceWorker.js

The Fetch-based XmlHttpRequest polyfil (at the top of the JavaScript file) is only needed because Readium depends on an external library that makes use of XHR. Otherwise, the relevant code starts at:

self.addEventListener('fetch', function(event) { .. });

Video/audio streaming is not implemented. This is just a quick-and-dirty proof of concept.

lrosenthol commented 7 years ago

Back on the original topic - I did a bunch of work on this section in my branch.

HadrienGardeur commented 7 years ago

Over the last week I've built a few prototypes based on Service Worker too.

There are two Github projects for these proof of concepts:

Both projects are based on the following draft: https://github.com/HadrienGardeur/webpub-manifest

Live demos are also available for both:

In addition, someone from my team built a Go app that can stream resources from an EPUB and serve them in HTTP (it also generates a Web Publication Manifest): https://github.com/banux/webpub-streamer

Building these proof of concepts has proved to be very helpful, and I've noticed a few limitations with SW too:

I'll continue to explore, update these projects accordingly and report issues that I might encounter along the way.

marcoscaceres commented 7 years ago

@HadrienGardeur

no easy way to match "/" routes to "/index.html" automatically, which seems to be fairly easy to do with AppCache Manifest

That should be super trivial:

addEventListener("fetch", (ev)=>{
 ev.respondWith(()=>{
       if(new URL(ev.request.url).pathname === "/"){
          return caches.match("/index.html");
       }
       return caches.match(ev.request);
   })
});
HadrienGardeur commented 7 years ago

@marcoscaceres sure, what I meant by "automatically" was without having a specific handler in the SW itself (either with a specific route or a regular expression).

It's not too bad since you need to deploy a SW per domain anyway, but it's still something that people will need to configure on their own (different frameworks could behave differently).

mac2net commented 7 years ago

I played with SW on a local self-signed MAMP site with WordPress using some plugins written by @marcoscaceres friends and colleagues at Mozilla.

Actually there is not much difference between 5DOC and SW in terms of building a system for downloading some CSS and JS as Blondie would say "one way or another" especially when using WordPress.

We should make this a full blown project.

HadrienGardeur commented 7 years ago

After some more exploration, I've found another limitation with Service Workers:

Any idea about that @marcoscaceres ? I'm more of a back-end developer, and I'm used to work with Redis or memcached, where the cache storage provides quite a lot more methods.

marcoscaceres commented 7 years ago

@HadrienGardeur, we should probably move this discussion to your repo - as it's getting into the technical details. Getting the name of the cache, in Chrome... for example:

async function whoHasRequest(request) {
  const resp = await caches.match(request);
  if (!resp) return null;
  for (const key of await caches.keys()) {
    const cache = await caches.open(key);
    if (await cache.match(request)) {
      return key;
    }
  }
  return null;
}

You could also easily map the URL path to a cache, so if it starts with "/books/book_name/whatevs" you could take the second path value and use that cache.open("book_name").then(doWhatevs)

marcoscaceres commented 7 years ago

Added: if (!resp) return null; above.

HadrienGardeur commented 7 years ago

Thanks @marcoscaceres, I'll create an issue in a separate repo.

It's good to know that using keys we can iterate over the different caches available, but it still feels like this sort of method should be built-in the APIs somehow (returning the cacheName for each match, would be useful with both match and matchAll for quite a few use cases).

TzviyaSiegman commented 7 years ago

Many of the discussions in this thread go beyond the scope of this document and should be discussed elsewhere. Discussion of access to "constituent resources" is considerably revised in current draft. See especially http://w3c.github.io/dpub-pwp-ucr/index.html#identify_const_resources, http://w3c.github.io/dpub-pwp-ucr/index.html#technical-metadata, and http://w3c.github.io/dpub-pwp-ucr/index.html#remap_links.