Closed HadrienGardeur closed 6 years ago
@HadrienGardeur thanks for starting this. I think most of your suggestions are good., but I do have some comments on a few
MUST contain all resources that are part of the publication (reading order + secondary resources) MAY contain additional resources that are referenced by the publication (for example a metadata record in a different format)
I think the problem here is going to be in defining what is "part of the publication" and what is "additional". For example, if a script in the publication references a data source that it uses for display, which side does that fall on?
SHOULD contain the request/response HTTP payloads for each resource
What does this mean for a publication that has never been on the web and doesn't know anything about http?
MUST NOT use the same media type and file extension if any resource contained in the package is protected by a DRM
What does this mean? Do you mean that if I put a JPEG into the package but apply DRM, it can't be a .jpg? Or do you mean that if anything in the package is DRM'd, I can't call it a .pwp (or whatever)? or both (or neither)?
And can you explain where this even comes from?
@HadrienGardeur, thanks for starting this.
However, without going into the details, we should really specify these as deltas v.a.v. WP. For example, the resource list is required in WP, so we should not repeat it in here.
@HadrienGardeur, more on the details
- SHOULD contain the request/response HTTP payloads for each resource
Why is this necessary? And why is it not even mentioned in WP and necessary in a PWP?
- MUST NOT use the same media type and file extension if any resource contained in the package is protected by a DRM
I would object to such differentiation (and I do not even understand why that would be necessary). EPUB3.1's media type is not dependent on whether it contains a DRM-d content, why would we need it here? It would also means a departure with EPUB3 when we talk about EPUB4...
@iherman I think that this is entirely a delta, since the WP requires only to list such resources in the manifest, whereas PWP requires their presence in the package.
I think the problem here is going to be in defining what is "part of the publication" and what is "additional". For example, if a script in the publication references a data source that it uses for display, which side does that fall on?
@lrosenthol that's indeed an issue, but it's one that affects WP in general, not just PWP.
At this point, we're being vague about how exactly the publication is bounded.
What does this mean for a publication that has never been on the web and doesn't know anything about http? Why is this necessary? And why is it not even mentioned in WP and necessary in a PWP?
First of all, this is listed as a SHOULD, not a MUST. For a publication that has never been on the Web, you wouldn't have this HTTP payload.
For publications that do exist on the Web, this is necessary for multiple reasons:
The presence of this payload is the most important differencs between using ZIP (file based) like in EPUB 3 or using a future version of Web Packaging (URL based, contains HTTP payloads).
What does this mean? Do you mean that if I put a JPEG into the package but apply DRM, it can't be a .jpg? Or do you mean that if anything in the package is DRM'd, I can't call it a .pwp (or whatever)? or both (or neither)?
It means that if the standard package is .pwp
and application/pwp
, you wouldn't be allowed to use this file extension and media type with a DRM'd PWP.
And can you explain where this even comes from?
Sure. EPUB has absolutely no restriction regarding that and it has a terrible impact on the ecosystem:
EPUB3.1's media type is not dependent on whether it contains a DRM-d content, why would we need it here? It would also means a departure with EPUB3 when we talk about EPUB4...
It would be a welcome departure for anyone building a reading system or extensively using APIs to distribute content. I would also argue that from a UX standpoint, this would make things a lot better for end users too: they wouldn't have to guess which software might eventually open their publication.
@HadrienGardeur,
@iherman I think that this is entirely a delta, since the WP requires only to list such resources in the manifest, whereas PWP requires their presence in the package.
You are right on that aspect.
- MAY contain additional resources that are referenced by the publication (for example a metadata record in a different format)
In some sense, this is true for WP as well. Do you mean to say that a package MAY contain resources that are not listed in the resource list? Wouldn't that create problems? I would rather use a "SHOULD NOT" instead of a "MAY"...
- MAY contain a signature for the whole package or individual resource
That is true for WP as well, I do not see that PWP specific.
For publication that do exist on the Web, this is necessary for multiple reasons:
- the HTTP response can contain important information (media type, language, links, status code, verb)
- to support offline reading of such publications we'll need a proxy (could be a Service Worker) that will rely on this stored payload to respond properly
The presence of this payload is the most important differences between using ZIP (file based) like in EPUB 3 or using a future version of Web Packaging (URL based, contains HTTP payloads).
If we use the Web Packaging approach, then this information is implicit in the packaging format itself. If we go the OCF way, however, is it necessary? Again, this was never the case for EPUB3. I still do not see it.
I can see the point, in terms of, say, UIX, that listing (optionally!) the media type and the language of a constituent resource can be useful in general, regardless of the packaging format. But, if so, that should be part of WP and not PWP. I do not really see what the advance knowledge of, say, the verb would bring.
@HadrienGardeur, on the DRM issue,
What you ask for, essentially, is that W3C defines two different types of PWP-s: one with DRM and one without a DRM. I would be opposed to such differentiation and, actually, it is also against our charter which puts DRM related features out of scope for this Working Group (see charter). I understand that the current DRM world in EPUB is a mess, but it is not our job to handle that one...
@iherman I'm not saying that we should work on DRM at all, in fact the restriction that I listed is IMO the only part we should care about.
EPUB went half-way there which is the worst thing you can do. There's a way of indicating how resources are encrypted (encryption.xml
), but this is not enough to figure out which DRM is used.
I'm not arguing for rejecting all DRM in WP/PWP, but I'd like to make sure that DRM publications cannot use the media types and file extensions that we declare (for the reasons listed above).
If we use the Web Packaging approach, then this information is implicit in the packaging format itself. If we go the OCF way, however, is it necessary? Again, this was never the case for EPUB3. I still do not see it.
@iherman I see a lot of if
statements in that paragraph.
What if we end up using something completely different from the two options that you've listed? At this point this is still completely undecided, which is why I'd rather list such requirements than assume that we'll get them for free.
In some sense, this is true for WP as well. Do you mean to say that a package MAY contain resources that are not listed in the resource list? Wouldn't that create problems? I would rather use a "SHOULD NOT" instead of a "MAY"...
Once again, this is very vaguely defined in WP at this point.
If a manifest contains a link to an ONIX record for metadata, do you consider that this record is part of the publication?
Using the Readium serialization, we wouldn't (that's the difference between links
which point to external resources and resources
which contains resources that are part of the publication but not in the reading order), but given our lack of info at this point in WP, it's hard to say.
If we use the Web Packaging approach, then this information is implicit in the packaging format itself. If we go the OCF way, however, is it necessary? Again, this was never the case for EPUB3. I still do not see it.
@iherman https://github.com/iherman I see a lot of if statements in that paragraph.
What if we end up using something completely different from the two options that you've listed? At this point this is still completely undecided, which is why I'd rather list such requirements than assume that we'll get them for free.
The real question is: if (sic!) we go the OCF way, is it necessary? I am not convinced at all.
In some sense, this is true for WP as well. Do you mean to say that a package MAY contain resources that are not listed in the resource list? Wouldn't that create problems? I would rather use a "SHOULD NOT" instead of a "MAY"...
Once again, this is very vaguely defined in WP at this point.
If a manifest contains a link to an ONIX record for metadata, do you consider that this record is part of the publication?
Yes. It is something that may be necessary offline, too (to check the values, stuff like that).
The real question is: if (sic!) we go the OCF way, is it necessary? I am not convinced at all.
The OCF way on its own won't be enough to support the WP -> PWP use case.
Having the HTTP response payload gives us pretty much everything we need, but we could also cherry pick a number of them instead and add them to the manifest (these would be PWP specific properties for the manifest).
I'd also like to point out that this is not the only place where WP and EPUB will be at odds with one another. The whole discussion about the document returned for a WP Address is completely irrelevant for a publication that doesn't exist on the Web, yet we've added this requirement to our spec.
@HadrienGardeur Concerning the HTTP response functionality you are discussing - AFAICT, that would only be useful in the one case of where you are taking a WP and packaging it into a PWP. However, that in the process of doing so, you are only getting the response - but not the request. Since the response to a given request is completely contextual to that request - storing only the response is useless because (a) you don't know when to issue that exact response later on and (b) you don't know what other possible responses could be issued.
On the issue of OCF/EPUB/etc. - this is why I want to keep the packaging format out of this discussion and over in #11 . We should focus strictly on the needs of packaging.
Back on the issue of what SHOULD or MAY be packaged - in many cases that is either up to the author/publisher or the tool doing the packaging - it's not up something we can specify in the standard. Using the ONIX example - if I was a "simple" packager, I would probably leave the ONIX data as a link (as @HadrienGardeur suggested) but if I was an "archival" packager, then as @iherman notes, I would want to include it. We should leave this area vague in the spec to enable both use cases.
@lrosenthol if you look at the examples in the explainer for Web Packaging, you'll see that they also include the request payload: https://github.com/WICG/webpackage/blob/master/explainer.md#multiple-origins-a-web-page-with-a-resources-from-the-other-origin
I do agree that this is tricky territory since HTTP is as you say contextual by nature.
Using the ONIX example - if I was a "simple" packager, I would probably leave the ONIX data as a link (as @HadrienGardeur suggested) but if I was an "archival" packager, then as @iherman notes, I would want to include it. We should leave this area vague in the spec to enable both use cases.
We should let authors decide what's best, but we need such concepts (the ability to link in the manifest outside of the context of primary/secondary resources or whatever we call them these days) first.
As you said, not all packagers will do the same thing either. Having this as a MAY leaves the door opened, which is IMO exactly what we want.
I agree with you @HadrienGardeur that MAY is the correct term on the resources - no question!
On the HTTP, yes, that particular package format does address that, but that isn't the same as what you wrote for our work. I would recommend that amend your requirement to include both request and response as that is what you really want (regardless of format)
@lrosenthol this is what I wrote in the initial message for this issue:
SHOULD contain the request/response HTTP payloads for each resource
Do you think it's not clear enough?
I would replace the '/' with an '&' to make it clear - since I definitely missed it.
I think some of these requirements (especially http payload) might be conflating offline and packaging. We already have offline requirements for WP. If there are requirements brought about by packaging, let's list those individually, not by implication.
@TzviyaSiegman I disagree, this is not an issue of offline vs packaging. The main issue is whether the publication originated from the Web or not.
The resolution to issue #9 may inform some of these statements.
But, for now, I'd think we could likely have consensus with slightly shorter list:
The latter MAY replaced a previous SHOULD; I don't think PWP's that started on the Web should should be viewed more favorably than those that didn't.
I left off signatures, as this should come from WP.
I left off MIME types & DRM, as I don't agree, and that's a discussion that can be had later (as it's negative not affirmative); further, the perceived failing of EPUB in this area could also be resolved with additional metadata, and I think steering clear of DRM is the best course (beyond our agreement to not obviate it).
@GarthConboy I am fine for those entries to go into the FPWD.
I do have some reservations on the last item, more in the formulation, though, maybe @prototypo can massage the terminology. I am not sure what "request & response HTTP payload" mean: surely not a full dump of what goes through the wire (which includes the whole content!). Or do we mean only the HTTP request & response headers? If if that is the case, a response header, for example, may include a bunch of information that is irrelevant.
If it is still possible in terms of time, I would prefer to say that may contain, for each resource, some of the data that usually comes through the HTTP response header, but not all. The media type, the language, the possible rel values are typically the information that may become useful, for example. The ideal would be to cherry pick the headers fields of interest rather than a general statement.
MUST contain a WP manifest at a well-known location
Why does this matter, and doesn't it cause problems for (un)packaging a web publication?
Do you mean a PWP manifest?
Well, a PWP manifest should be compatible with (a superset of) a WP in my view... Ie, you are probably right that the first entry should say 'PWP' manifest but that should also be a 'WP' manifest.
For the well-known location: I am not sure we can avoid having that for a package. If the 'entry page' of the unpackaged publication is part of the package, and that entry page uses a (suitable) relative URI then it should be o.k. (a bunch of details to specify later...)
But... it is not necessarily a "location" in the sense of a place in the file system, you are right about that. If I consider a Web Packaging, I could say that the manifest must be the first item in the incoming stream, for example, which is not the same as the "location".
I could say that the manifest must be the first item in the incoming stream, for example, which is not the same as the "location"
Ah, okay, this makes more sense. I read it like location in a virtual file system. "Position" might be less confusing(?), or maybe I shouldn't read issues in early morning...
I guess the exact mechanism will depend on the packaging method. I am not sure what terms to use...
@iherman, as the first section of the doc should be independent of the choice of a packaging mechanism, I would be cautious about any wording like 'location' or 'position'. The notion of 'convention' would be safer IMO.
Maybe: MUST contain a WP manifest retrievable using a simple convention.
@llemeurfr I like 'convention'.
I would make it clear, in the document, that the 'convention' at hand would depend on the packaging format (in a note).
With that, we can leave that in the capable hand of @prototypo :-)
"Well-known location" is used on the Web (URL) but also for file packages like OCF.
It's true that for Web Packaging, position would be more accurate but well-known location is IMO the most common term for what we need.
So long as its qualified in the spec, I don't care what we use.
I was just questioning for this thread whether we meant location as in where it is in a directory structure when it's packed/unpacked (sort of like the annotations.json file for Open Annotation in EPUB). If it was a PWP manifest, it wouldn't matter, but if it was the WP it would either impose a requirement on that spec or require file manipulation.
As that's been resolved, I'm fine now.
[...] the perceived failing of EPUB in this area could also be resolved with additional metadata, and I think steering clear of DRM is the best course (beyond our agreement to not obviate it).
@GarthConboy this is more than perception, there's truly nothing that you can do as a reading system developer to avoid complex situations and bad UX.
Additional metadata wouldn't help either, what would solve the problem partially would be the presence of a media parameter to indicate an encryption scheme, for example: application/pwp;encryption=lcp
This would only partially solve the problem, since we can't always rely on the media type. That's why having a separate extension is also useful.
I know that DRM is a difficult topic of discussion, but completely avoiding these matters means that we'll potentially end up with the same problems as EPUB.
@HadrienGardeur I think we can leave the DRM and extension discussion to EPUB4 as I don't see either as relevant for PWP
@HadrienGardeur Please refer to our charter https://www.w3.org/2017/04/publ-wg-charter/#scope. DRM is distinctly out of scope. Even if it weren't, I don't think it's relevant to this discussion. If there is demand for a media type, please open an issue to indicate that. If there is demand for encryption (beyond signatures), please open a separate issue. Let's avoid putting too much information in any one issue.
@TzviyaSiegman I'll file a separate issue. This was first opened when we had zero activity and/or starting points for discussions about PWP, I'm happy to divide things into further issues and close this one once we've reached a first consensus (which seems to be the case).
The Working Group just discussed Issue 6
, and agreed to the following resolutions:
RESOLVED: agree on those four bullets, add a new issue exclusively on the HTTP and refer to it
Based on the discussion (see https://github.com/w3c/pwpub/issues/6#issuecomment-349042286), closing this issue.
A Packaged Web Publication:
manifest.json
)