w3c / pwpub

W3C packaged Web Publications
https://w3c.github.io/pwpub/
Other
15 stars 9 forks source link

Do we need a dedicated media type and file extension for PWP? #17

Open HadrienGardeur opened 6 years ago

HadrienGardeur commented 6 years ago

Based on recent discussions with @iherman and @lrosenthol it seems likely that PWP will include a default option for packaging.

In order to identify such packages on the Web and on a file system, we'll need to roll out our own:

At this point this is strictly an early proposal for these values.

Based on what we adopt as our default packaging format, we can easily customize the media type a little more (for example application/pwpub+zip or application/pwpub+cbor).

HadrienGardeur commented 6 years ago

There are also a few other related questions for media type and file extension:

HadrienGardeur commented 6 years ago

My own take on these three questions:

iherman commented 6 years ago

If we indeed go down the media type route:

  • each profile MUST use its own media type and extension if the packaging format is different from the default one

Agreed

  • for encrypted resources we should at least define a media type parameter, and ideally I'd like the spec to contain requirements as well (the manifest MUST NOT be encrypted)
  1. I am fine having an "encrypted=true" kind of parameter with default being "false" (of course).
  2. The spec requirement is another matter, which is not part of the Media type definition. (Let us discuss this separately when the time comes)

    • there might be additional media type parameters that could help us future proof our work (profile or version for example are good candidates)

Well... maybe. Some profiles may fall under the first category (eg, PDF) in which case it is a separate media type. Let us leave it open. But, as a general principle of using type parameters, I agree.

All that being said: we should not make things too dependent on media types. They are difficult to use: a typical problem is that users rarely have a control over the media type of a resource when they put it up on a Web Server (unless they own the web server and know how to configure it...)

HadrienGardeur commented 6 years ago

@iherman we truly need both, it's not one or the other.

There are specific situations where the only info available is the media type (for instance in an API, that's what we'll use most of the time).

I agree that in some situations the user won't have control over them, but it doesn't mean that they're not useful.

lrosenthol commented 6 years ago

the problem with using a media type to represent encryption (for whatever purpose) is that such things aren't "all or nothing". It is very common, for example, to encrypt the content but leave the metadata in plain text. Or I might have a single resource protected. How would these get identified? And what happens if the wrong one is used.

IMO, you are just asking for problems...

baldurbjarnason commented 6 years ago

@lrosenthol

IMO, you are just asking for problems...

Exact same applies for allowing DRM to use the same media type as unencrypted files. The encryption is by definition going to be vendor-specific (since DRM is out of scope for our work here). How are user agents, regular users, distributors, etc. going to deal with the fact that some of the files they get will—from their perspective—be filled with gobbledygook binary blobs even though they are identified with the same media type?

You can't rely on distribution metadata (like ONIX or OPDS) for this since, at some point, the file will end up with a user or system that has no easy way of knowing which DRM scheme the file is using.

You can't expect them all to open up the file and understand a vendor-specific encryption scheme. People will open the PWPs in their reading app, get error messages and assume that either the PWP is just broken or that the reading app is broken. They will conclude that PWPs suck and just switch back to Amazon's Kindle.

Unless, of course, you lock the end user in the client of your choice, but that isn't exactly what we're hoping to accomplish with an open standard.

With a separate media type the apps that support it can specifically register support for DRMed files on an OS level. Much less confusion all around. Open the file and you can trust that the app that gets launched can render it. You don't have that trust with EPUB today.

This has been a serious problem, end-to-end, for the EPUB ecosystem, from the perspective of usability, marketing, authoring, and distribution. That some epubs are tied to their vendor but look on the outside identical to unencrypted files has been a serious impediment to adoption of the format. Back before I gave up trying to promote it as a more standard alternative to mobi this was always the first issue mentioned, even by non-technical users.

I've long since lost count of how many conversations I've had go along these lines.

Yes, EPUB is a standard format. Yes, iBooks has the best support for the standard. No, you can't read all EPUBs in iBooks. Yes, I know I said it had the best support for the standard. Some EPUBs have to be read in a different kind of reading app. No, those apps don't have the same features and OS support as iBooks. Yes, Nook offers EPUBs. No, you can't read them in the other apps. I'm not even sure you can download them anymore. No… …

… so on and so forth.

It's a nightmare and it makes promoting the format next to impossible. Allowing PWPs to go down this path as well would be a huge own goal.

Purely from a marketing and adoption perspective, the end-user confusion that will be caused by allowing encrypted files to use the same extension and media type as unencrypted will be incredibly counter-productive.

By allowing DRMed files to use the same media type and file name extension you are just asking for problems. IMO.

Any proposal for allowing DRMed files to share a media type with non-DRMed PWPs needs to explain in detail what they are planning to do to prevent this from causing a repeat of the issues this has caused for EPUB.

This is a make-or-break issue for adoption.

lrosenthol commented 6 years ago

@baldurbjarnason of course the other example to this is PDF, which has supported both standardized and custom encryption as part of the format for 25 years - with a single media type. Clearly multiple media types is not the only solution...

iherman commented 6 years ago

@lrosenthol my reading of @HadrienGardeur's proposal of the 'encryption' parameter is a bit from an opposite point of view: it just says that there is some encrypted content in the publication, and does not intend to go deeper than that. How the encryption is then identified, which resources are encrypted, etc, is not defined by this parameter, and should be identified by other means (I would expect as part of the manifest, though probably not specified by this WG).

In some sense, although technically it is not different, it may be clearer to call the parameter something like no-encrypted-content, with a default value set to true. Technically identical but conveys a different message.

baldurbjarnason commented 6 years ago

@lrosenthol PDF got away with it because for a large part of its early life, PDF equaled Adobe Acrobat for most users. Even after other vendors starting shipping their own popular PDF readers Adobe has still had a strong influence of the direction and implementation of the format. (I know you know this, just re-stating it for others who might read this.)

So, this strategy only really works when a single vendor is dominant for the format which is something we specifically want to avoid with web tech.

Even worse, and I know this is not the case here, but from an outside observer this could look like a strategy that's specifically an attempt to take control over an openly specified ecosystem. As in, if this becomes the case, only Adobe's version of PWPs will be seen as fully compatible with the standard as Adobe's apps (or Adobe-licensed) will be the only ones able to support all of the vendor-specific DRM schemes. All other UAs would be seen by the end-user as inferior and sub-standard.

Now, I'm confident this is not the case, but I can assure you that a lot of people from outside our working group who read this thread will draw this conclusion.

IMO, allowing DRMed PWPs to use the same media type and extension as non-DRMed PWPs poses a serious threat to user adoption and UA diversity for the format. Any file that uses DRM needs to be unambiguously identified as not a regular PWP to even a casual user and we (as in Rebus Foundation) will strongly oppose any attempt to ship a standard via the W3C that does not ensure this. This is a make or break issue.

If you want DRMed PWPs to share a media type with non-DRMed, we need detailed assurances of how this will not result in a situation where the end user has a DRMed PWP and can't immediately tell that it has DRM and is therefore incompatible with most non-proprietary PWP UAs.

AFAIK, the only reliable way of ensuring this is forbidding DRMed files from identifying themselves using the same media type and extension as a non-DRMed file. But if you have some other reliable method in mind, I'm all ears.

The world now is very different from what it was 25 years ago. Tactics that worked back then almost certainly won't work today. Adobe tried to repeat PDF's tactics with EPUB and it failed (spectacularly, IMO). Absent detailed assurances, I see no reason for us to trust that repeating this tactic wouldn't fail again with PWPs.

HadrienGardeur commented 6 years ago

+1 to what @baldurbjarnason just said

This is very similar to what I listed as a requirement in #6 and it's IMO the only reasonable way to deal with DRM. By simply saying that this is out of scope or not our responsibility, there's a very serious risk of undermining the whole ecosystem (similar to what happened with EPUB).

I will also strongly object shipping any spec that does not forbid DRM content from using the same media type and file extension.

Aside from DRMs, there could be standard ways of encrypting content, this is what I was thinking when proposing a media type parameter. I don't think a flag (true/false) would be the best option then, this could work more like charset with a specific list of values.

lrosenthol commented 6 years ago

Since such things live forever, let me make it very clear that Adobe has zero interest in trying to shove DRM into PWP. In fact, I would say that we also believe strongly that PWP should be "DRM-free". Of course, that doesn't mean that specific profiles of PWP, such as EPUB4, wouldn't include DRM - only that the core/base standard doesn't call for it nor prevent it in the profiles.

So since we are only talking here about PWP - then I think we all agree that the default packaging expressed in this document should have a single media type and should not include encryption or DRM functionality.

And then we can argue and debate media types, DRM syntax, etc. in the specific profiles.

Yes?

TzviyaSiegman commented 6 years ago

Perhaps encryption merits a separate issue in GitHub (and then, it might require revisiting the charter). The issue at hand is whether PWP should have a dedicated media-type. I think that we cannot answer this until we decide on the packaging format. If we do have media-type, I agree with @lrosenthol

the default packaging expressed in this document should have a single media type and should not include encryption or DRM functionality.

baldurbjarnason commented 6 years ago

@lrosenthol

So since we are only talking here about PWP - then I think we all agree that the default packaging expressed in this document should have a single media type and should not include encryption or DRM functionality.

And then we can argue and debate media types, DRM syntax, etc. in the specific profiles.

Yes, agreed. As long as it's clear that those separate profiles can't use the same file extension as default packaging. It's an unfortunate fact that many mainstream OSes rely on file extensions so that's something we need to account for.

@TzviyaSiegman

The issue at hand is whether PWP should have a dedicated media-type. I think that we cannot answer this until we decide on the packaging format.

I agree. This is a very good point.

It's really hard to decide on anything concrete here until we have at least a default packaging format. There's a chance, for example, that this might well be solved for us if we use the web packaging spec which has its own media type, own file extension, no parameters, and no fragment identifiers. (You'd do linking based on the original URL of each included resource.)

So, if we end up going down that route for our default packaging format we might just end up with regular .wpk packages of regular web publication. All of the profile/parameter stuff might then have to use a separate media type.

iherman commented 6 years ago

@baldurbjarnason just thinking out loud here, without any clear conclusion in my mind:

So, if we end up going down that route for our default packaging format we might just end up with regular .wpk packages of regular web publication.

I wonder whether that will work out, due to the possible specificities of a WP; we may have to say something like ".../wpub+wpk" to denote that is is not only a package but also a WP.

All of the profile/parameter stuff might then have to use a separate media type.

While I am fine with a have to, we may find an agreement with the Web Packaging folks to use some parameters on the media type just like we discussed earlier in this thread. My reasoning is that we do not want to fall into another extreme, whereby a publication using, say, an encrypted file as part of its otherwise clear-text resources (not necessarily for DRM purposes but for other type of confidentiality reasons on, say, research data that need privacy protection) would give the impression of being totally alien to WP-s. Defining a totally disjoint media type might convey the wrong message. (I say "might".)

But, as @TzviyaSiegman said, we should postpone this discussion until we have a clearer idea on what the packaging format(s) may be.

baldurbjarnason commented 6 years ago

@iherman Yeah, we are getting way ahead of ourselves.

… But I strongly disagree with any notion that web publications are so special that they would need to be special-cased in the packaging spec. For example, non-DRM encryption IMO should be handled by the web platform itself using JSON Object Signing and Encryption coupled with client-side web crypto APIs. You have a web component in the page that loads the JOSE file and decrypts it, for example, with a user-provided password and renders it in the page. That way it would work both in the web publication and in the packaged web publication.

IIRC, this is stated in the draft specs but we seem to forget this all the time in the issues:

We should always default to web-native features instead of inventing something publication-specific.

iherman commented 6 years ago

@baldurbjarnason I am not referring to the DRM and JSON Object Signing and such. No disagreements there. What I am not sure about is whether the Web Packaging spec will be flexible enough for our type of content, or whether it will be geared towards, say, Web Applications only with a Web App Manifest only, and we then may have some differences. But these are pure speculations at the moment.

baldurbjarnason commented 6 years ago

@iherman

I am not referring to the DRM and JSON Object Signing and such. No disagreements there. What I am not sure about is whether the Web Packaging spec will be flexible enough for our type of content, or whether it will be geared towards, say, Web Applications only with a Web App Manifest only, and we then may have some differences. But these are pure speculations at the moment.

Yes. And I'm expressing my scepticism about this default assumption that other web specs aren't capable of handling publication needs using an example that has been mentioned several times before in this thread.

For example, I've seen no indication that the Web Packaging spec is going to be web apps only, considering that CDN distribution of one-page-plus-resources is a major use case driving its development.

I see no point in speculations that assume a priori that web tech won't be able to do the job. It isn't a productive default assumption.

TzviyaSiegman commented 6 years ago

I am not closing the issue because the question about media-type still stands, but let us take pontification elsewhere. Thank you

llemeurfr commented 5 years ago

From this long thread, I have picked two ideas:

Both are now inserted in the LPF draft.