Closed marcoscaceres closed 8 years ago
This is very true. This area is more complicated than the Terminology section of the Use Cases document makes it out to be (that's where the above quoted passage is from).
Already you can have:
http:
)https:
)file:
)Each one of these types of origins have differing rights and privileges, can access differing APIs, etc.
And it sounds like the PWP format is going to involve inventing an additional origin for packaged publications similar to jar:
which is inevitably going to have another different set of rights and privileges from the other types of origin.
And it sounds like the PWP format is going to involve inventing an additional origin for packaged publications similar to jar: which is inevitably going to have another different set of rights and privileges from the other types of origin.
I am not sure that is correct. Regardless of the fact whether we would have a package or not in the sense of a ZIP file or anything like that, if that is defined, its access can have the same access (http, https, etc) that you refer to...
All that being said: is there (should there be) a difference in behaviour for the end user? Wouldn't one expect a UA to hide these differences if otherwise the access rights are there? I believe that is all what that intro part tries to say...
On 14 Sep. 2016, at 9:46 pm, baldurbjarnason notifications@github.com wrote:
This is very true. This area is more complicated than the Terminology section of the Use Cases document makes it out to be (that's where the above quoted passage is from).
Already you can have:
Insecure remote origin (http:) Secure remote origin (https:) web server running on localhost HTML loaded from the filesystem (file:) Each one of these types of origins have differing rights and privileges, can access differing APIs, etc.
And it sounds like the PWP format is going to involve inventing an additional origin for packaged publications similar to jar: which is inevitably going to have another different set of rights and privileges from the other types of origin.
Given our flirtation with alternative protocols in the past (and the fact the the web platform has not been shown deficient), that seems extremely unlikely.
Let's not jump the shark.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
@marcoscaceres I did not mention the possibility of trying to spec a new package origin because I think it's a good idea or likely to work. In fact, I think it's a horrible idea. I mentioned it because people on the DPUB list IIRC have expressed a desire to do so. It seems likely to me that they'll try to make it a part of the PWP spec.
@iherman The problem here is that access and UA behaviour already varies dramatically depending on origin, in each case with explicit effects on the UX in general. The browser trend recently is to highlight the differences because they've found vagueness about origin to be a frequent security issue (i.e. HTTP versus HTTPS connections).
BTW the file url spec is being actively developed: https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-12 Also for consideration is Apple's technique for iBooks which copies the file to a place inside the user's library folder. My thought as a model is a PWP could be copied, decompressed and cached to a browser's folder inside the library and the browser would serve the content with a new address scheme/mime type (ie doc://my.pwp). This would be better than the arbitrary number ID I saw in a previous scheme proposal. I am not convinced about service workers and I think the timeline is way out for implementation in a way that's beyond a cache. The more speculative the spec, the higher the risk of failure.
@baldurbjarnason @marcoscaceres While I completely agree the goal is to be fully compatible with the web security model, it is quite likely that extensions to that model need to be considered as we move towards implementation. But any extensions should indeed be done as part of a conversation with the web security community and not in situ.
@baldurbjarnason @marcoscaceres While I completely agree the goal is to be fully compatible with the web security model, it is quite likely that extensions to that model need to be considered as we move towards implementation.
Wait, we are here talking about requirements - no one said anything about implementation. Also, we've not hit any requirement that even hints at the web NOT being able to address the requirements.
But any extensions should indeed be done as part of a conversation with the web security community and not in situ.
Absolutely. I'm still betting we won't need to talk to them at all.
Wait, we are here talking about requirements - no one said anything about implementation. Also, we've not hit any requirement that even hints at the web NOT being able to address the requirements.
Agreed, though there are various "prototypes" on which we can base some ideas such as EPUB and 5DOC. The packaging aspect is what has the most significant impact on the security model as the current browser model (as mentioned earlier by @baldurbjarnason) isn't well suited to it.
Agreed, though there are various "prototypes" on which we can base some ideas such as EPUB and 5DOC.
...And good old service workers + web manifest + other web goodies... So, agree. As the dust settles and we get a nice and short (hopefully, less than 1 page!) set of Requirements - we should absolutely duke it out with prototypes.
I'm really excited about this, and willing to go toe-to-toe with anything anyone can come up with. :feelsgood:
The packaging aspect is what has the most significant impact on the security model
Agree. But in the sense that it risks putting users at risk by introducing yet another attack vector and more attack surface.
as the current browser model (as mentioned earlier by @baldurbjarnason) isn't well suited to it.
Strongly disagree - and so do millions of secure sites which fuel the world economy. To say that the underlying technology on which banks, stock exchanges, shopping sites, and privacy critical applications rely on is somehow not secure enough to power a bunch of books is... well, a bit of a stretch.
The burden of proof remains on this effort to prove that the Web is not secure enough to do books or other publication types. And a hill I'm willing to die on defending, by proving to you all that it is absolutely secure enough.
Strongly disagree - and so do millions of secure sites which fuel the world economy. To say that the underlying technology on which banks, stock exchanges, shopping sites, and privacy critical applications rely on is somehow not secure enough to power a bunch of books is... well, a bit of a stretch.
The web is definitely safe enough to power a bunch of books. What isn't safe is the ePub model of zipped up packages side loaded from insecure locations.
Many on the DPUB list seem to have taken this to mean that the web's security model needs to be fixed to make an ePub-like PWP format possible.
None of this is actually a problem if we stick to solving these problems with service workers and with minor additions like, for example, some way to indicate document-ness and, maybe, a ToC.
@marcoscaceres @baldurbjarnason
The web is definitely safe enough to power a bunch of books. What isn't safe is the ePub model of zipped up packages side loaded from insecure locations.
BINGO! However, I would rephrase the second sentence to "what isn't safe yet..."
THIS is the area where there we need to go beyond the current set of web security models to address the requirements for "packages from not-yet-trusted locations" (it's not that they aren't secure, just that their 'trustworthiness" isn't defined under the current model).
None of this is actually a problem if we stick to solving these problems with service workers and with minor additions like, for example, some way to indicate document-ness and, maybe, a ToC.
Even if SW's worked for publications (with, as mentioned in a separate thread, they do not currently), it wouldn't address the situation.
I'm really excited about this, and willing to go toe-to-toe with anything anyone can come up with.
Excellent - so am I! I look forward to the opportunity next week!
@lrosenthol
THIS is the area where there we need to go beyond the current set of web security models to address the requirements for "packages from not-yet-trusted locations" (it's not that they aren't secure, just that their 'trustworthiness" isn't defined under the current model).
The insecurity of the package depends on the same origin policy, which packages break. There is no way for me to trust a zip file you send me or that it has the authority to behave as if it's coming from the origin it claims to be coming from (let alone having a URL scheme that meets the requirements of the same origin policy of scheme, host, and port).
Without some complicated signing scheme, it is impossible to know if you have modified the contents of a package or not (i.e., I don't know if your package is going to hack me). I think it's a big ask for this effort to solve side loading, specially if this is already solved by HTTPS.
In fact, the We Community has tried to solve this multiple times already, like:
Note the second option MAY help here, but a lot of us are fairly skeptical about it (because we would rather this be solved with HTTP2).
Even if SW's worked for publications (with, as mentioned in a separate thread, they do not currently), it wouldn't address the situation.
I don't know what situation you are talking about. If you are talking about the USB use case, then that sets up an unrealistic strawman at worst or it's a tradeoff that we would make at best.
The insecurity of the package depends on the same origin policy, which packages break. There is no way for me to trust a zip file you send me or that it has the authority to behave as if it's coming from the origin it claims to be coming from (let alone having a URL scheme that meets the requirements of the same origin policy of scheme, host, and port).
I agree that today there is no defined mechanism for doing so - and that is my point! This is where we need to extend the current web security model, to provide a standard way to do this.
Without some complicated signing scheme, it is impossible to know if you have modified the contents of a package or not (i.e., I don't know if your package is going to hack me).
Signing is certainly one approach, though not the only.
I think it's a big ask for this effort to solve side loading, specially if this is already solved by HTTPS.
Just because it is a big ask doesn't mean it's not an important one....
I agree that today there is no defined mechanism for doing so - and that is my point! This is where we need to extend the current web security model, to provide a standard way to do this.
But do we REALLY need to? We (the web community) tried doing this for the last decade. We failed spectacularly (not once, but like 3+ times!). I'm deeply skeptical that this group is going to be able to solve this problem.
In my career, I've seen so many companies either destroy themselves (WAC, Bondi) or nearly destroy themselves chasing this dream (e.g., Mozilla with Firefox OS), that it's sad to hear that the publishing industry would want to go down this route - specially when browser vendors now have a solution for you and are willing to work on making it suit the publishing industries requirements.
So, I honestly wish you all good luck with that if you want to go down that packaging route. If TLS already solves the problem of cryptographic assurance, verification and temper-proofing, and reification: then why then go and try to invent YA signing and packaging scheme?
I can only appeal to you from experience, as Editor the W3C Widget effort: please look carefully at previous attempts that have failed at this; there is lots of history - and it all leads back to "don't fight the Web".
Signing is certainly one approach, though not the only.
I'm all ears. But again, don't do unnecessary work.
Just because it is a big ask doesn't mean it's not an important one....
We know it's important. But we also know that the whole zip + signature thing doesn't work. We have standards for doing so (I wrote the main ones, like https://www.w3.org/TR/widgets-digsig/). So I know pretty well what the challenges are.
But do we REALLY need to?
Assuming that we continue to move forward with our packaging requirement, then I think the answer is an emphatic Yes! - for the reasons that you are exposing. We need to ensure (at least) that same level of security for packages.
We (the web community) tried doing this for the last decade
OK. And I've been working on file formats (incl. packaging/archiving) for over 30 years now including being one of the original authors of StuffIt and the ETSI Signature Standards.
I would put forth that one of the reasons that the web work to date has failed, as evidenced by this conversation, is that you are simply coming at this from a different place. I am quite certain that a solution is possible - the question (IMO) is whether we will be able to get enough support to reach a standard.
I would put forth that one of the reasons that the web work to date has failed, as evidenced by this conversation, is that you are simply coming at this from a different place. I am quite certain that a solution is possible - the question (IMO) is whether we will be able to get enough support to reach a standard.
I think that is fair - and it's probably time to put it to the wider group for consensus.
We've presented all sides of the argument here.
I have a few problems with both the portability requirement as presented so far and with how it has been used to justify a demand for packaged web publications that can be side-loaded off the file system.
(This is a bit of a long comment, apologies in advance.)
The portability requirement isn't nearly as monolithic as it's made out to be
There's much more to portability than just side-loading off the filesystem or reading offline. It's an aggregation of many disparate problems that we've been bundling together in this discussion because they happen to be bundled together in other formats like ePub.
Packaged publications only become a requirement if you view the above list as a single, monolithic use case. But if you break the use cases apart, then it starts to look like many of them are already solved by the web community or would be solved with very small additions to the existing web stack.
Since sharing for most of the devices people use needs to be done over the network anyway and is primarily done using links, packaged formats offer few advantage over regular hosted web pages for peer-to-peer sharing. If you want to support mobile phones and tablets, any problem that can't be solved with link sharing needs to be solved on the (web) app level, using peer-to-peer networking, anyway no matter whether you package the publication or not.
Solving sharing specifically for the USB key case could potentially mean excluding the web's largest current user base. Instead of solving the sharing use case, packaged formats bring a host of problems that, despite fifteen years of extensive work at both the W3C and the IDPF, remain unsolved.
So, my suggestion is that a packaged (e.g. an ePub-style) format be explicitly taken off the table as an option for Portable Web Publications and that fixing packaged web publications be left up to a future version of ePub.
@baldurbjarnason Very interesting way of dissecting the problem, and your suggestions for individual solutions to each is enlightening.
However, what it demonstrates is that you are indeed looking at the problem as individual ones that could be solved with different solutions, rather than a single solution that can address all of them (or even more than just one).
For example, a user wants to personally archive a publication - why would that use a separate format than a website archive? And then that user wants to share that personal archive with a friend - could they just email that .warc file?
I will also point out that there are various other use cases that are not addressed by your separate solutions - annotation, collections, collation etc.
However, what it demonstrates is that you are indeed looking at the problem as individual ones that could be solved with different solutions, rather than a single solution that can address all of them (or even more than just one).
Solutions that already largely exist will always trump a hypothetical unified solution. They already exist. There is much more uncertainty in betting on a single, much more complicated, unified solution with unknown payoffs.
For example, a user wants to personally archive a publication - why would that use a separate format than a website archive? And then that user wants to share that personal archive with a friend - could they just email that .warc file?
Archives need high fidelity that preserves a website at a particular point in time, right down to the HTTP headers on the response. Compromising that fidelity to address security concerns would make it inadequate to the task. Personal archives, especially if you intend to share them do not have that luxury: you have to make adjustments in the name of security. A unified solution cannot serve both use cases without compromises. Separate solutions can. The only thing we would accomplish by creating a unified solution for all of these use cases is to create a format so filled with compromises that it doesn't do anything well.
I will also point out that there are various other use cases that are not addressed by your separate solutions - annotation, collections, collation etc.
Exactly. As you observe they are orthogonal to my original list. They are separate use cases with separate solutions. Annotation is already being addressed by other working groups at the W3C and I'm confident that, once we look into the other use cases separately, they won't need a packaged format and might even have solutions in the pipeline in the web stack already.
Solutions that already largely exist will always trump a hypothetical unified solution
If the solutions actually address the requirements - I would agree. However, your solutions are individual cannot be combined (which is also something called out in various use cases), then I don't see them as actual solutions (since it doesn't solve the needs)
Archives need high fidelity that preserves a website at a particular point in time, right down to the HTTP headers on the response
If the goal is to archive a "web site" that is fetched and then archived using HTTP(S) requests - then I agree. However, if the OWP content is put into the container directly after (or even during!) authoring - such that it never actually interacts with HTTP(S) - then it's a non-issue.
Personal archives, especially if you intend to share them do not have that luxury: you have to make adjustments in the name of security
Agreed - but that doesn't mean that it is any less secure. Just that the security model isn't what we have today.
A unified solution cannot serve both use cases without compromises.
As I did with @marcoscaceres - I think we will have agree to disagree, since I believe that it is possible to have such a solution without any such compromises. And I say this based on actual prototyping - rather than assumption.
As you observe they are orthogonal to my original list. They are separate use cases with separate solutions.
No, they aren't - because those things all need to work on the "archive format". Or are you suggesting that we have a separate solution for annotation and/or collections?
Or are you suggesting that we have a separate solution for annotation and/or collections?
Yes, we should absolutely have separate solutions for those. We don't even need to solve for annotations, as that case can be solved in user-land.
Back to service workers:
When it is proposed that SW moves from "a solution" to "THE SOLUTION" it is normal that there are BIG concerns about going all-in on what is essentially theoretical solutions.
It is very important that any "all eggs in one basket" ideas are not just thoroughly discussed but also rigorously tested.
- The table - https://nolanlawson.github.io/html5workertest/
Is a bit misleading, as it lists APIs that are deprecated and won't ever have SW support. For example, XHR, WebSQL, and possibly a bunch of others.
@nolanlawson, can confirm.
Some features are missing on all browsers and some browsers don't support SW at all
SWs are a progressive enhancement. The technology is designed to degrade gracefully.
- While there isn't much analytical discussion on the web about SW, there are several negative posts as well as worries about security.
This is FUD and depends who you are talking to. Reality is that SWs are happening and are secure (and where issues are found, we fix them).
- On SW demo sites, there is very little to no assistance about what to look for. How can I tell it's working? Do I turn off my WIFI and reload or click on an object?
Open Chrome Dev Tools in Chrome and select the "Application" tab - you will see full control of SWs there. I encourage you to go through this course if you want to learn about SWs and how they work: https://www.udacity.com/course/offline-web-applications--ud899
- And just because some in the tech community want to deliver SW solutions, how do we know this is what users want? Is there any user feedback?
OMG, so much. I don't even know where to start. I suggest you go and watch the Google IO sessions that justify both the technical and economic rationale for them. I also encourage you to watch the sessions from: https://events.withgoogle.com/progressive-web-app-dev-summit/
When it is proposed that SW moves from "a solution" to "THE SOLUTION" it is normal that there are BIG concerns about going all-in on what is essentially theoretical solutions.
Again, this is not theoretical. We have two browsers shipping the standard, Microsoft working on it, and Apple planning to implement it (with already large parts implemented)
It is very important that any "all eggs in one basket" ideas are not just thoroughly discussed but also rigorously tested.
Service workers are not some theoretical toy. Some of the biggest sites in the world, including Twitter, Facebook, and Google itself are already making use of them in various ways.
Thanks @marcoscaceres I will read your answer carefully, but to clarify, @baldurbjarnason linked to the matrix. I was just a "dedicated" reader following links. I found this video (https://www.youtube.com/watch?v=mnSRP7q8WKw) of a Facebook techie talking about SW, and, you have to admit the (I assume) experienced interviewer's confusion about SW was no less at the end of the clip than in the beginning. Use of SW for notifications sounds useful but it's not an offline manifestation. I couldn't find anything how Twitter uses SW. I found this critique of SW: https://arc.applause.com/2016/06/07/progressive-web-app-issues-and-concerns/
Also @baldurbjarnason has me very confused about his opinions on JavaScript. At one point he was writing post after post against the use of JavaScript specially in eBooks, but now it seems he has changed his view by 180°. Or is it the case that what's not good for the EPUB goose is good for the PWP gander?
I couldn't find anything how Twitter uses SW.
Push notifications.
So the same as FB?
I am going to stick with my assertion the SW is a solution not the solution. This is not a criticism of the technology, just the way it's being sold at the moment. I would love to add SW notifications to a 5DOC for updates or new citations, for example.
Also @baldurbjarnason has me very confused about his opinions on JavaScript. At one point he was writing post after post against the use of JavaScript specially in eBooks, but now it seems he has changed his view by 180°. Or is it the case that what's not good for the EPUB goose is good for the PWP gander?
Apologies for not being clear on this. I make my living writing JavaScript so obviously I'm not against JavaScript in general. My issue has been with the security implications of JavaScript in portable documents (portable here in the sense that they are packaged in some way as with ePub or Docx).
Basically, portable documents like ePub and a hypothetical packaged PWP are not compatible with the web's existing security model which in turn is mostly (but not exclusively) a question of what capabilities JavaScript gets to use.
We have three options:
If people aren't willing to relax the portability requirement then the quickest, least complex, least uncertain path is to cripple JavaScript in portable documents in some way.
The optimal path, IMO, is to focus on getting the web to a place where it can properly serve the needs of publishing (i.e. make the web more bookish where it make sense) and worry about portability later.
Given how vehement many group members have been about not making any compromises on the packaging requirement, I have up until now just assumed that what I consider to be the optimal path wasn't on the table. But if it is, then I'd rather we just focus on improving the web and punt portability down the road—leave it for future ePub specs to solve.
We have three options:
While I am certain in favor of #2 (as I have mentioned a few times :)) - another option came up today in discussion, which is what EPUB systems do today.
In the current (primarily) walled gardens of EPUB (iBooks, Kindle etc.), it works. However, going forward as a generic document/publication solution, I find this option unacceptable and a non-starter - but I wanted it listed for completeness sake.
Speaking as somebody who has spent the past three years working full-time with ePub3 and several years more than that making ebooks in general, it is simply not true that ebook reading systems allow full javascript execution with no limitations.
None of them allow unfettered JS. They all at the very least put limitations on how ePubs interact with the network (XMLHttpRequest is as good as unusable). Some limit storage or implement storage in dubious ways. They all cripple many features involving forms. Many CSS APIs are unusable, disabled, or broken. Many DOM APIs for interacting with locations or the browser are unusable, disabled, or broken.
None of the above are bugs (ePub reading system bugs are a big topic in their own right) and are either caused by ePub's lack of a security model, gaps in web specifications in general, or specifically prohibited by the ePub spec.
@marcoscaceres Yes absolutely, html5workertest.com in its current UI is a bit misleading. There's an open issue to mark which APIs are intentionally unshipped: https://github.com/nolanlawson/html5workertest/issues/8
Security section reworded and reorgarnized.
This is not true. Browsers make concessions for developers on localhost: like allowing service workers to be registered without requiring TLS; but it's meant for developers only.
So, connecting to localhost is not equivalent to connecting to a remote origin.