w3c / pwpub

W3C packaged Web Publications
https://w3c.github.io/pwpub/
Other
15 stars 9 forks source link

First attempt at listing requirements #6

Closed HadrienGardeur closed 6 years ago

HadrienGardeur commented 7 years ago

A Packaged Web Publication:

lrosenthol commented 7 years ago

@HadrienGardeur thanks for starting this. I think most of your suggestions are good., but I do have some comments on a few

MUST contain all resources that are part of the publication (reading order + secondary resources) MAY contain additional resources that are referenced by the publication (for example a metadata record in a different format)

I think the problem here is going to be in defining what is "part of the publication" and what is "additional". For example, if a script in the publication references a data source that it uses for display, which side does that fall on?

SHOULD contain the request/response HTTP payloads for each resource

What does this mean for a publication that has never been on the web and doesn't know anything about http?

MUST NOT use the same media type and file extension if any resource contained in the package is protected by a DRM

What does this mean? Do you mean that if I put a JPEG into the package but apply DRM, it can't be a .jpg? Or do you mean that if anything in the package is DRM'd, I can't call it a .pwp (or whatever)? or both (or neither)?

And can you explain where this even comes from?

iherman commented 7 years ago

@HadrienGardeur, thanks for starting this.

However, without going into the details, we should really specify these as deltas v.a.v. WP. For example, the resource list is required in WP, so we should not repeat it in here.

iherman commented 7 years ago

@HadrienGardeur, more on the details

  • SHOULD contain the request/response HTTP payloads for each resource

Why is this necessary? And why is it not even mentioned in WP and necessary in a PWP?

  • MUST NOT use the same media type and file extension if any resource contained in the package is protected by a DRM

I would object to such differentiation (and I do not even understand why that would be necessary). EPUB3.1's media type is not dependent on whether it contains a DRM-d content, why would we need it here? It would also means a departure with EPUB3 when we talk about EPUB4...

HadrienGardeur commented 7 years ago

@iherman I think that this is entirely a delta, since the WP requires only to list such resources in the manifest, whereas PWP requires their presence in the package.

I think the problem here is going to be in defining what is "part of the publication" and what is "additional". For example, if a script in the publication references a data source that it uses for display, which side does that fall on?

@lrosenthol that's indeed an issue, but it's one that affects WP in general, not just PWP.

At this point, we're being vague about how exactly the publication is bounded.

What does this mean for a publication that has never been on the web and doesn't know anything about http? Why is this necessary? And why is it not even mentioned in WP and necessary in a PWP?

First of all, this is listed as a SHOULD, not a MUST. For a publication that has never been on the Web, you wouldn't have this HTTP payload.

For publications that do exist on the Web, this is necessary for multiple reasons:

The presence of this payload is the most important differencs between using ZIP (file based) like in EPUB 3 or using a future version of Web Packaging (URL based, contains HTTP payloads).

What does this mean? Do you mean that if I put a JPEG into the package but apply DRM, it can't be a .jpg? Or do you mean that if anything in the package is DRM'd, I can't call it a .pwp (or whatever)? or both (or neither)?

It means that if the standard package is .pwp and application/pwp, you wouldn't be allowed to use this file extension and media type with a DRM'd PWP.

And can you explain where this even comes from?

Sure. EPUB has absolutely no restriction regarding that and it has a terrible impact on the ecosystem:

EPUB3.1's media type is not dependent on whether it contains a DRM-d content, why would we need it here? It would also means a departure with EPUB3 when we talk about EPUB4...

It would be a welcome departure for anyone building a reading system or extensively using APIs to distribute content. I would also argue that from a UX standpoint, this would make things a lot better for end users too: they wouldn't have to guess which software might eventually open their publication.

iherman commented 7 years ago

@HadrienGardeur,

@iherman I think that this is entirely a delta, since the WP requires only to list such resources in the manifest, whereas PWP requires their presence in the package.

You are right on that aspect.

  • MAY contain additional resources that are referenced by the publication (for example a metadata record in a different format)

In some sense, this is true for WP as well. Do you mean to say that a package MAY contain resources that are not listed in the resource list? Wouldn't that create problems? I would rather use a "SHOULD NOT" instead of a "MAY"...

  • MAY contain a signature for the whole package or individual resource

That is true for WP as well, I do not see that PWP specific.

For publication that do exist on the Web, this is necessary for multiple reasons:

  • the HTTP response can contain important information (media type, language, links, status code, verb)
  • to support offline reading of such publications we'll need a proxy (could be a Service Worker) that will rely on this stored payload to respond properly

The presence of this payload is the most important differences between using ZIP (file based) like in EPUB 3 or using a future version of Web Packaging (URL based, contains HTTP payloads).

If we use the Web Packaging approach, then this information is implicit in the packaging format itself. If we go the OCF way, however, is it necessary? Again, this was never the case for EPUB3. I still do not see it.

I can see the point, in terms of, say, UIX, that listing (optionally!) the media type and the language of a constituent resource can be useful in general, regardless of the packaging format. But, if so, that should be part of WP and not PWP. I do not really see what the advance knowledge of, say, the verb would bring.

iherman commented 7 years ago

@HadrienGardeur, on the DRM issue,

What you ask for, essentially, is that W3C defines two different types of PWP-s: one with DRM and one without a DRM. I would be opposed to such differentiation and, actually, it is also against our charter which puts DRM related features out of scope for this Working Group (see charter). I understand that the current DRM world in EPUB is a mess, but it is not our job to handle that one...

HadrienGardeur commented 7 years ago

@iherman I'm not saying that we should work on DRM at all, in fact the restriction that I listed is IMO the only part we should care about.

EPUB went half-way there which is the worst thing you can do. There's a way of indicating how resources are encrypted (encryption.xml), but this is not enough to figure out which DRM is used.

I'm not arguing for rejecting all DRM in WP/PWP, but I'd like to make sure that DRM publications cannot use the media types and file extensions that we declare (for the reasons listed above).

HadrienGardeur commented 7 years ago

If we use the Web Packaging approach, then this information is implicit in the packaging format itself. If we go the OCF way, however, is it necessary? Again, this was never the case for EPUB3. I still do not see it.

@iherman I see a lot of if statements in that paragraph.

What if we end up using something completely different from the two options that you've listed? At this point this is still completely undecided, which is why I'd rather list such requirements than assume that we'll get them for free.

In some sense, this is true for WP as well. Do you mean to say that a package MAY contain resources that are not listed in the resource list? Wouldn't that create problems? I would rather use a "SHOULD NOT" instead of a "MAY"...

Once again, this is very vaguely defined in WP at this point.

If a manifest contains a link to an ONIX record for metadata, do you consider that this record is part of the publication?

Using the Readium serialization, we wouldn't (that's the difference between links which point to external resources and resources which contains resources that are part of the publication but not in the reading order), but given our lack of info at this point in WP, it's hard to say.

iherman commented 7 years ago

If we use the Web Packaging approach, then this information is implicit in the packaging format itself. If we go the OCF way, however, is it necessary? Again, this was never the case for EPUB3. I still do not see it.

@iherman https://github.com/iherman I see a lot of if statements in that paragraph.

What if we end up using something completely different from the two options that you've listed? At this point this is still completely undecided, which is why I'd rather list such requirements than assume that we'll get them for free.

The real question is: if (sic!) we go the OCF way, is it necessary? I am not convinced at all.

In some sense, this is true for WP as well. Do you mean to say that a package MAY contain resources that are not listed in the resource list? Wouldn't that create problems? I would rather use a "SHOULD NOT" instead of a "MAY"...

Once again, this is very vaguely defined in WP at this point.

If a manifest contains a link to an ONIX record for metadata, do you consider that this record is part of the publication?

Yes. It is something that may be necessary offline, too (to check the values, stuff like that).

HadrienGardeur commented 7 years ago

The real question is: if (sic!) we go the OCF way, is it necessary? I am not convinced at all.

The OCF way on its own won't be enough to support the WP -> PWP use case.

Having the HTTP response payload gives us pretty much everything we need, but we could also cherry pick a number of them instead and add them to the manifest (these would be PWP specific properties for the manifest).

I'd also like to point out that this is not the only place where WP and EPUB will be at odds with one another. The whole discussion about the document returned for a WP Address is completely irrelevant for a publication that doesn't exist on the Web, yet we've added this requirement to our spec.

lrosenthol commented 7 years ago

@HadrienGardeur Concerning the HTTP response functionality you are discussing - AFAICT, that would only be useful in the one case of where you are taking a WP and packaging it into a PWP. However, that in the process of doing so, you are only getting the response - but not the request. Since the response to a given request is completely contextual to that request - storing only the response is useless because (a) you don't know when to issue that exact response later on and (b) you don't know what other possible responses could be issued.

On the issue of OCF/EPUB/etc. - this is why I want to keep the packaging format out of this discussion and over in #11 . We should focus strictly on the needs of packaging.

Back on the issue of what SHOULD or MAY be packaged - in many cases that is either up to the author/publisher or the tool doing the packaging - it's not up something we can specify in the standard. Using the ONIX example - if I was a "simple" packager, I would probably leave the ONIX data as a link (as @HadrienGardeur suggested) but if I was an "archival" packager, then as @iherman notes, I would want to include it. We should leave this area vague in the spec to enable both use cases.

HadrienGardeur commented 7 years ago

@lrosenthol if you look at the examples in the explainer for Web Packaging, you'll see that they also include the request payload: https://github.com/WICG/webpackage/blob/master/explainer.md#multiple-origins-a-web-page-with-a-resources-from-the-other-origin

I do agree that this is tricky territory since HTTP is as you say contextual by nature.

Using the ONIX example - if I was a "simple" packager, I would probably leave the ONIX data as a link (as @HadrienGardeur suggested) but if I was an "archival" packager, then as @iherman notes, I would want to include it. We should leave this area vague in the spec to enable both use cases.

We should let authors decide what's best, but we need such concepts (the ability to link in the manifest outside of the context of primary/secondary resources or whatever we call them these days) first.

As you said, not all packagers will do the same thing either. Having this as a MAY leaves the door opened, which is IMO exactly what we want.

lrosenthol commented 7 years ago

I agree with you @HadrienGardeur that MAY is the correct term on the resources - no question!

On the HTTP, yes, that particular package format does address that, but that isn't the same as what you wrote for our work. I would recommend that amend your requirement to include both request and response as that is what you really want (regardless of format)

HadrienGardeur commented 7 years ago

@lrosenthol this is what I wrote in the initial message for this issue:

SHOULD contain the request/response HTTP payloads for each resource

Do you think it's not clear enough?

lrosenthol commented 7 years ago

I would replace the '/' with an '&' to make it clear - since I definitely missed it.

TzviyaSiegman commented 7 years ago

I think some of these requirements (especially http payload) might be conflating offline and packaging. We already have offline requirements for WP. If there are requirements brought about by packaging, let's list those individually, not by implication.

HadrienGardeur commented 7 years ago

@TzviyaSiegman I disagree, this is not an issue of offline vs packaging. The main issue is whether the publication originated from the Web or not.

GarthConboy commented 7 years ago

The resolution to issue #9 may inform some of these statements.

But, for now, I'd think we could likely have consensus with slightly shorter list:

The latter MAY replaced a previous SHOULD; I don't think PWP's that started on the Web should should be viewed more favorably than those that didn't.

I left off signatures, as this should come from WP.

I left off MIME types & DRM, as I don't agree, and that's a discussion that can be had later (as it's negative not affirmative); further, the perceived failing of EPUB in this area could also be resolved with additional metadata, and I think steering clear of DRM is the best course (beyond our agreement to not obviate it).

iherman commented 7 years ago

@GarthConboy I am fine for those entries to go into the FPWD.

I do have some reservations on the last item, more in the formulation, though, maybe @prototypo can massage the terminology. I am not sure what "request & response HTTP payload" mean: surely not a full dump of what goes through the wire (which includes the whole content!). Or do we mean only the HTTP request & response headers? If if that is the case, a response header, for example, may include a bunch of information that is irrelevant.

If it is still possible in terms of time, I would prefer to say that may contain, for each resource, some of the data that usually comes through the HTTP response header, but not all. The media type, the language, the possible rel values are typically the information that may become useful, for example. The ideal would be to cherry pick the headers fields of interest rather than a general statement.

mattgarrish commented 6 years ago

MUST contain a WP manifest at a well-known location

Why does this matter, and doesn't it cause problems for (un)packaging a web publication?

Do you mean a PWP manifest?

iherman commented 6 years ago

Well, a PWP manifest should be compatible with (a superset of) a WP in my view... Ie, you are probably right that the first entry should say 'PWP' manifest but that should also be a 'WP' manifest.

For the well-known location: I am not sure we can avoid having that for a package. If the 'entry page' of the unpackaged publication is part of the package, and that entry page uses a (suitable) relative URI then it should be o.k. (a bunch of details to specify later...)

But... it is not necessarily a "location" in the sense of a place in the file system, you are right about that. If I consider a Web Packaging, I could say that the manifest must be the first item in the incoming stream, for example, which is not the same as the "location".

mattgarrish commented 6 years ago

I could say that the manifest must be the first item in the incoming stream, for example, which is not the same as the "location"

Ah, okay, this makes more sense. I read it like location in a virtual file system. "Position" might be less confusing(?), or maybe I shouldn't read issues in early morning...

iherman commented 6 years ago

I guess the exact mechanism will depend on the packaging method. I am not sure what terms to use...

llemeurfr commented 6 years ago

@iherman, as the first section of the doc should be independent of the choice of a packaging mechanism, I would be cautious about any wording like 'location' or 'position'. The notion of 'convention' would be safer IMO.

Maybe: MUST contain a WP manifest retrievable using a simple convention.

iherman commented 6 years ago

@llemeurfr I like 'convention'.

I would make it clear, in the document, that the 'convention' at hand would depend on the packaging format (in a note).

With that, we can leave that in the capable hand of @prototypo :-)

HadrienGardeur commented 6 years ago

"Well-known location" is used on the Web (URL) but also for file packages like OCF.

It's true that for Web Packaging, position would be more accurate but well-known location is IMO the most common term for what we need.

mattgarrish commented 6 years ago

So long as its qualified in the spec, I don't care what we use.

I was just questioning for this thread whether we meant location as in where it is in a directory structure when it's packed/unpacked (sort of like the annotations.json file for Open Annotation in EPUB). If it was a PWP manifest, it wouldn't matter, but if it was the WP it would either impose a requirement on that spec or require file manipulation.

As that's been resolved, I'm fine now.

HadrienGardeur commented 6 years ago

[...] the perceived failing of EPUB in this area could also be resolved with additional metadata, and I think steering clear of DRM is the best course (beyond our agreement to not obviate it).

@GarthConboy this is more than perception, there's truly nothing that you can do as a reading system developer to avoid complex situations and bad UX.

Additional metadata wouldn't help either, what would solve the problem partially would be the presence of a media parameter to indicate an encryption scheme, for example: application/pwp;encryption=lcp

This would only partially solve the problem, since we can't always rely on the media type. That's why having a separate extension is also useful.

I know that DRM is a difficult topic of discussion, but completely avoiding these matters means that we'll potentially end up with the same problems as EPUB.

lrosenthol commented 6 years ago

@HadrienGardeur I think we can leave the DRM and extension discussion to EPUB4 as I don't see either as relevant for PWP

TzviyaSiegman commented 6 years ago

@HadrienGardeur Please refer to our charter https://www.w3.org/2017/04/publ-wg-charter/#scope. DRM is distinctly out of scope. Even if it weren't, I don't think it's relevant to this discussion. If there is demand for a media type, please open an issue to indicate that. If there is demand for encryption (beyond signatures), please open a separate issue. Let's avoid putting too much information in any one issue.

HadrienGardeur commented 6 years ago

@TzviyaSiegman I'll file a separate issue. This was first opened when we had zero activity and/or starting points for discussions about PWP, I'm happy to divide things into further issues and close this one once we've reached a first consensus (which seems to be the case).

css-meeting-bot commented 6 years ago

The Working Group just discussed Issue 6, and agreed to the following resolutions:

The full IRC log of that discussion <dauwhe> topic: Issue 6
<garth> https://github.com/w3c/pwpub/issues/6#issuecomment-347396713
<NickRuffilo> Tzviya: Issue 6 now - Now that we have a working defintion, Hadrien started an issue around listing requirements. Most can be resolve by pointing to what was said in issue 9.
<tzviya> q?
<Hadrien> q+
<dauwhe> github: https://github.com/w3c/pwpub/issues/6
<garth> q?
<tzviya> ack Hadrien
<NickRuffilo> hadrien: I think the issue is about the requirements - not necessarily the definition. I like the proposal that was rephrased, but it's separate from a definition, so it's still useful. We should extract something out of it, not replace it with the defintion
<tzviya> garth's proposal https://github.com/w3c/pwpub/issues/6#issuecomment-347396713
<NickRuffilo> Garth: I agree, it's different than a definition and it's appropriate for inclusion in the first public working draft.
<NickRuffilo> Tzviya: I think the requirements can be different from the working draft. I posted a link (above) to what Garth proposed.
<tzviya> https://www.irccloud.com/pastebin/3vRnKT3R/
<ivan> For the minutes: MUST contain a WP manifest at a well-known location (e.g. manifest.json)
<ivan> MUST contain all resources that are part of the publication (reading order + secondary resources)
<ivan> MAY contain additional resources that are referenced by the publication (for example a metadata record in a different format)
<ivan> MAY contain the request & response HTTP payloads for each resource
<NickRuffilo> Tzviya: Do we have any comments on these?
<wolfgang> q+
<garth> Ship it!
<tzviya> ack wolfgang
<ivan> q+
<NickRuffilo> Wolfgang: What's the advantage of the 4th bullet point? Do we need it?
<tzviya> q?
<garth> Though in this case I think it’s important that such support is available, though not required.
<garth> q?
<tzviya> ack ivan
<NickRuffilo> Hadrien: There is a number of HTTP header information that is important. This is something necessary for the implementation in many cases. For a format like web-packaging, to support offline reading (which is almost exactly the same as packaged) you need the response. You need the request and the response as they go hand in hand - for the service workers... There is information and it's needed
<NickRuffilo> for implementation
<baldurbjarnason> q+
<NickRuffilo> Ivan: I disagree with some things here. There is information that's useful, but is the whole HTTP payload - which may include a number of things that are totally irrelevant. The spec - the number of possible headers is HUGE. For the first working draft, it should be fine, but my preference is to make explicit which items in the HTTP request/response - that in some way in other - must be made
<NickRuffilo> available. The way described here is a number of unnecessary things.
<tzviya> ack baldurbjarnason
<garth> ack baldurbjarnason
<NickRuffilo> Garth: Even though I see a queue, i resolve we discuss later.
<Hadrien> +1 with what Baldur said
<garth> q?
<ivan> q+
<NickRuffilo> Baldur: We need HTTP for full compatibility with web-stack. If we leave that out of packaging, we'll have quite a few situations where a packaged web publication will act much differently than a regular web-publications. A host of things, especially around javascript... We're trying to encompass all the possible interactions. It's more complicated to leave it out than it is to support it.
<tzviya> ack ivan
<NickRuffilo> Ivan: Can we at least say - that we refer here only to the sender HTTP verbs? We don't have to include the extension verbs simply because a server users them and sends them back?
<garth> q?
<NickRuffilo> Hadrien: There might be a few extras, but it shouldn't be that bad.
<NickRuffilo> Ivan: We should create a seperate issue that looks only at this. If we include everything or just cherry pick from the header. And we flag this as something to discuss
<Hadrien> q+
<NickRuffilo> Tzviya: So we're agreeing on the first 3 bullets - and we're adding a new issue to discuss the 4th
<tzviya> ack Hadrien
<garth> ack Hadrien
<NickRuffilo> Garth: I recommend we agree on the 4 then attach the issue to the 4th
<NickRuffilo> Hardien: I'll open some issues about a dedicated media type and file extension
<NickRuffilo> Ivan: A media type often gets a file extension - so you may not need both
<ivan> PROPOSED: agree on those four bullet, add a new issue exclusively on the HTTP and refer to it
<garth> +1
<NickRuffilo> +1
<tzviya> +1
<wolfgang> +1
<jbuehler> +1
<ivan> +1
<baldurbjarnason> +1
<timCole> +1
<Ben_Dugas> +1
<dkaplan3> +1
<jasminemulliken> +1
<Hadrien> +1
<JunGamo> +1
<rdeltour> +1
<ivan> RESOLVED: agree on those four bullets, add a new issue exclusively on the HTTP and refer to it
<George> +1
iherman commented 6 years ago

Based on the discussion (see https://github.com/w3c/pwpub/issues/6#issuecomment-349042286), closing this issue.