w3c / pwpub

W3C packaged Web Publications
https://w3c.github.io/pwpub/
Other
15 stars 9 forks source link

Should the entry page be optional in the package? #33

Closed llemeurfr closed 5 years ago

llemeurfr commented 5 years ago

The current consensus is that a WP MUST have a Primary Entry Page, i.e. an HTML page.

I has been argued by the Audio TF that audio books don't have HTML pages and adding a dummy page to a package would be a burden. Note that in this situation (no HTML), a JSON ToC should exist in the manifest or the ToC must be inferred from the track listing (i.e. the reading order)... this is discussed in https://github.com/w3c/wpub/issues/369.

On the other side, I was made aware of a particular situation (Audiolib in France) where a book had a beautiful graphical ToC and the audio publisher integrated an image of this illustration in the audiobook as supplementary content and it would have been great to be able to use it as a real ToC. In such a situation, having an HTML entry page makes great sense.

When a package is exposed as a Web Publication, it is easy for a processor to create on the fly an entry page if there is none in the package. This HTML page will have a link to the manifest (or will embed the manifest) and its content will be created from the metadata found in the manifest. This is a tiny development.

We could therefore choose to have the HTML entry page optional in the package. It would offer simplicity for basic use cases and guarantee great results for advanced use cases.

Note: The other solution is to conclude that a JSON ToC is not an option and that an HTML Toc is imposed to audiobook publishers.

GarthConboy commented 5 years ago

I'm somewhat torn here. I view the entry page as unneeded fluff in a packaged audiobook, and thus can see disallowing or discouraging it. On the other hand, if we're going to have an HTML ToC (which I don't object to), there is no reason that couldn't be basically the only content of an entry page.

So, I guess, for me, it comes down to a JSON ToC decision -- if that's the path we go, I'd obviate the entry page (for packaged audiobooks).

iherman commented 5 years ago

Regardless of the ToC in JSON issue, I would not want to disallow the HTML page. I think we all agreed that the packaging format we define would not only be for audio books but, possibly, for other profiles, too, like visual narratives. Having an exception for audio books is a bad idea from that standpoint...

HadrienGardeur commented 5 years ago

I find this discussion quite confusing: the requirement for having an entry page in a packaged publication has nothing to do with the ToC.

As a reminder:

Regardless of the ToC in JSON issue, I would not want to disallow the HTML page

There's also a massive difference between not requiring an HTML entry page and disallowing the entry page.

The entry page is completely useless for the packaged audiobook use case, it was mostly added to WP in order to always have a fallback option.

For packaged audiobooks:

Based on this, I would highly recommend to:

Forcing something (entry page) for the sake of consistency is IMO a bad approach (EPUB did a very similar mistake by forcing an HTML wrapper for FXL instead of allowing images in spine, which was a foolish decision that the whole ecosystem still have to struggle with today).

llemeurfr commented 5 years ago

I agree with @HadrienGardeur 's conclusion, which has also some implication on #34.

iherman commented 5 years ago

See https://github.com/w3c/pwpub/issues/32#issuecomment-45474530.

I would modify this by saying that allow but not require manifest.jsonld _or index.html, but require that at least one of the two should be available. This should be the case, imho, for the general case.

Whether a particular profile, like audio books, would make further restrictions, that is another matter.

dauwhe commented 5 years ago

I would modify this by saying that allow but not require manifest.jsonld _or index.html, but require that at least one of the two should be available. This should be the case, imho, for the general case.

Whether a particular profile, like audio books, would make further restrictions, that is another matter.

Packaging seems to divide neatly into two aspects: how to package, and what to package. Would there be utility in leaving the "what to package" to the profiles, and just use a "packaging" spec to define the restrictions on ZIP?

iherman commented 5 years ago

@dauwhe this for me is the last resort (but is doable). I would like to maintain as much unity among profiles as possible; putting another way, minimize the compartmentalization of these.

TzviyaSiegman commented 5 years ago

I think it is worth considering @dauwhe's proposal seriously. Why should a packaging spec go beyond HOW to package? I believe that we are beginning to define details of the HOW based on requirements for specific file types. As we look at different modules, we will likely encounter different requirements. Perhaps restricting the definition of the package to HOW will enable the modules to coexist more peacefully.

GarthConboy commented 5 years ago

I can agree with Hadrien's last two bullets above.

I will push back a little on

the requirement for having an entry page in a packaged publication has nothing to do with the ToC

my linkage of the two was driven by the still undecided serialization of the ToC -- if it's only HTML, it might as well be in the entry page (though, said wouldn't be required). That's why I'd like to resolve this issue at least with eye out to where/how the ToC is encoded.

iherman commented 5 years ago

This is just making a note. If the decision is that the primary entry page is not necessary, then the canonicalization section must be updated. Indeed, that algorithm makes use of the Document DOM Node of the primary entry page. The algorithm text must be modified allowing that parameter to be undefined.

iherman commented 5 years ago

The canonicalization referred to https://github.com/w3c/pwpub/issues/33#issuecomment-456734392 ensures the default settings of the 'name' (ie, the title) and the reading order (consisting of the single entry of the primary entry page); these would be required to change in the spec.

HadrienGardeur commented 5 years ago

I think that in general, if the scope and the name of the spec is changed to "Web Publication Manifest" instead of "Web Publications", most of the text about the entry page should be changed.

The first use of WP will be for audiobooks: they don't need an entry page. Visual narratives (comics, mangas & BDs) won't need an entry page either.

This goes beyond the canonicalization, it also affects the core requirements since a packaged audiobook won't have an address and therefore won't have an entry page at that address either.

It also affects the title:

If not included in the authored manifest, the user agent MUST use the value of the title element [html] of the Web Publication’s primary entry page, if present, when generating the canonical manifest.

iherman commented 5 years ago

@HadrienGardeur I disagree. I think we should minimize these changes. The current agreement is that the WPM is, as much as possible, general and not audio book specific only. That is true even if the current rec track work is on audio books only.

HadrienGardeur commented 5 years ago

Well, if you go down that road, each profile will need to deviate from the core spec. I don't think that's a very good strategy.

The current WPM is not general at this point, it's a relic of how this spec was designed (as "Web Publications" instead of "Web Publication Manifest") when affordances were still considered to be within the scope of the core spec.

llemeurfr commented 5 years ago

To sum up the current state of the discussion:

iherman commented 5 years ago

@llemeurfr,

my proposed solution is a bit different (actually, I am actually not sure it is very different, maybe just more detailed), taking into account that the lightweight package is for WP in general and not only for audio books:

This means that, e.g., audiobook publishers have the possibility to ignore the entry page when creating the package, whereas a scholarly publisher can use the (more natural) embedded manifest. And anything in between, with mixtures of HTML and other media.

I do realize that this is a bit of an extra load on user agents, because they must be prepared to go the extra mile to parse the HTML entry page and extract the manifest. On the grand scale of things I do not think this is a really a big deal, however, compared to the overall job of creating a decent user agent (this is also based on my experimentation which has proven to be a pretty straightforward).

This also allows for your "WP-Setter" approach if a packaged publication is to be turned into a bona fide WP.

HadrienGardeur commented 5 years ago

@iherman that approach is fine (except that we've been using manifest.jsonld instead of manifest.json, which is more unique to WP).

Specific profiles can then further restrict things, for example the audiobook profile could require the presence of a separate manifest.

That said, while the packaging spec would be lightweight enough to accommodate the needs of dedicated profiles, that's not the case of the WP spec itself. Including a url in the manifest and responding to that URL with an entry page are still requirements in WP, which is problematic when the entry page is not required in the package.

iherman commented 5 years ago

@iherman that approach is fine (except that we've been using manifest.jsonld instead of manifest.json, which is more unique to WP).

Yep, you're right, it should be .jsonld

Including a url in the manifest and responding to that URL with an entry page are still requirements in WP, which is problematic when the entry page is not required in the package.

Sure, if we go down that line, we will have to review the WP spec with this special goggle on...

llemeurfr commented 5 years ago

@iherman I can buy this "entry page OR manifest" in the container. The overhead is of treating both cases is already there for WP user agents.

GarthConboy commented 5 years ago

I can buy this too. Almost kinda like it. :-)

iherman commented 5 years ago

If the final decision is that we MUST have an entry page to make the content directly usable on the Web, here is the minimal PEP that is needed:

<html>
    <head>
        <link rel="publication" href="manifest.jsonld">
    </head>
</html>

By adding this, a (unpacked) package is a bona fide Web Publication (in case the name of the JSON-LD manifest file is fixed for a package).

geoffjukes commented 5 years ago

@iherman to my mind the entry page and manifest serve two different purposes; The Manifest describes the package, the entry page describes how to display it.

It seems to me that the manifest is required in both audio-only and 'web publication' packages. The entry page is not.

Therefore, what would be the implications of making the manifest required, and the landing page optional?

If a publisher desires to control the presentation of the audio, they can include a 'landing page resource' in the manifest. The reference could then be to package-local resources OR web-based resources.

For example a distributor may have a customized web-based audio player that is linked by reference, rather than packaged.

I'm not a web developer, so I won't pretend to understand if this is sensible.

iherman commented 5 years ago

@geoffjukes

@iherman to my mind the entry page and manifest serve two different purposes; The Manifest describes the package, the entry page describes how to display it.

on a very high level, that can be said (although in the current set up some items, like the title, MAY come from the PEP (ie, the primary entry page, ie, index.html).

Therefore, what would be the implications of making the manifest required, and the landing page optional?

That would not work on the Web. A usual user agent cat do something with any HTML file; it the only thing it sees is a bare JSON file, it can (possibly) display it as a text, but that is as far it goes. The PEP, as HTML, can serve (beyond the display proper) a bunch or roles: set up the 'origin' for the publication that will be important for security reasons if any kind of JS file is used, create an environment (almost a minor operating system) to run those JS files, etc. HTML has turned into the starting point for just about everything:-)

Hence the PEP is a MUST for the publication on the Web as some sort of a starting point. That also means that if a packaged publication ever wants to be used (unpacked) on the Web, then the PEP must be present.

The other way round... well, that is the issue.

geoffjukes commented 5 years ago

@iherman Thanks for the clarification, that certainly makes sense.

In essence, the PEP becomes the 'is web enabled' indicator, and would therefore be required in the end-user package. It would not be present in the business-to-business packages, and would not be required (but there is no need to call out that distinction in the specification).

I would be amenable to generating the minimal PEP when we are "packaging for the web", with the caveat that I would like the manifest be required, as it is the most useful component from a business to business perspective.

mattgarrish commented 5 years ago

Perhaps part of the problem is that we're trying to base everything on Web publications instead of a broader concept of a publication that is inherited by Web publications, audiobooks, etc.

A cleaner separation of concerns might lead to a model like:

Publication manifest |--> Web Publication
                     |          ^
                     |          |
                     |          v
                     |--> Audiobook

where all our outputs are considered modules that inherit from a common publication manifest, with perhaps no required metadata and certainly no mention of structure. That way we can introduce different required metadata, other ways of harvesting metadata, different manifest/pep discovery models, etc. etc. on a per-model basis.

There should always be paths between the formats, but it won't always be true that one format is necessarily the other all the time.

I think our last revision to take out the affordances got us halfway to this model, but we'd still need to separate all the Web-specific structures and metadata requirements.

It's not what we set out to define, of course, but maybe is a more pragmatic model?? (I could be out to lunch on this comment, too, as I haven't thought it through all that deeply. :)

TzviyaSiegman commented 5 years ago

I am having emoji issues today, but i very much agree with @mattgarrish.

geoffjukes commented 5 years ago

@mattgarrish I believe this concept is compatible with the work that @dauwhe has been doing (and to a lesser degree myself).

Speaking for myself, everything starts with a manifest. That being said [and assuming a minimum set of requirements such playback order, audio file name/size/length/checksum, supplemental file name/size/checksum, etc] we should be able to generate a manifest based on the current delivery packages that we receive from publishers, simply by analyzing the files.

In one of our more complex products, we package audio, html, and media overlays, to provide a 'read along' experience. We do this with a custom player, but I throw it out there as an example of "There should always be paths between the formats" - as it is more accurately a hybrid of two :)

mattgarrish commented 5 years ago

Right, the manifest is the key to every format because otherwise you just have html/audio/etc. It's always been what has bound all the resources together. So long as it is common, there will be various ways of translating content, and not strictly from one to the other as you say.

I think the problem has been that we started by analyzing one module without realizing it was just one module. As we discussed on the call today, I don't think it's imperative that everything derive from web publications, but it's taken moving on to another module, audiobooks, to realize that we've arguably been modeling too much on one possible rendering.

That said, I still think there's going to be some complexity in remodeling what we have. What is a canonical manifest, for example, if data harvesting may not apply to every module? These are eminently solvable problems, of course, but might take some creative thinking to avoid needless duplication. I'm going to see what I can make of splitting out the manifest from wpub, as a first step.

geoffjukes commented 5 years ago

@mattgarrish As a pure audio publisher (primarily) I have struggled with the term 'readingOrder' to describe what is more appropriately a 'listeningOrder'. How much separation do you foresee in the modular approach?

Metadata (book title etc) is common (potentially) but 'runtime' would only be relevant to the audio component of an audio-enabled package. So would that live in an 'audio' segment? or would a 'listeningOrder' imply audio, and a 'readingOrder' imply HTML etc? What about the media overlay, would that rely on the existing SMIL? or could we consider a future datapoint 'mediaSync' or something?

Rhetorical questions really.

mattgarrish commented 5 years ago

I have struggled with the term 'readingOrder' to describe what is more appropriately a 'listeningOrder'. How much separation do you foresee in the modular approach?

That's an interesting question. My gut says we should find a single term to avoid over-complicating the process, as both identify the same basic concept - the ordered progression to follow in the absence of user divergence. There's seduction in defining a model that anything can be up-translated to, but the downside is that this can be terribly confusing to authors, as well as developers.

But this is the sort of case where we need to find consensus for all stakeholders; it'll be part of harmonizing the formats. Perhaps there is a more neutral term than reading order, or maybe people decide my view is wrong. :)

Assuming a single property, though, the reading/listening aspect would be influenced by a combination of the type of resource the manifest identifies itself as (e.g., via its schema.org type) and the media types of the resources referenced in the reading order.

I'm not sure about the future of SMIL, though. @danielweck and @marisademeglio have been spearheading how SMIL/media overlays will be made manifest in web publications and audio books.

dauwhe commented 5 years ago

As a pure audio publisher (primarily) I have struggled with the term 'readingOrder' to describe what is more appropriately a 'listeningOrder'.

I always liked "sequence" but got outvoted.

mattgarrish commented 5 years ago

I always liked "sequence" but got outvoted.

Might be time to start a new issue in the wpub tracker given this feedback about audio, otherwise we could hijack this issue. :)

iherman commented 5 years ago

This issue was discussed in a meeting.

iherman commented 5 years ago

I like what @mattgarrish described in https://github.com/w3c/pwpub/issues/33#issuecomment-460369726, though I am a bit concerned about the complexity thereof, as well as the possible proliferation of mutually incompatible modules. The original consensus proposal included the fact that a user agent remains compatible with a packaged WPUB without further ado, and the choice of adding (or not) a PEP is exclusively in the hands of the publishers.

I would also note that, as showed in https://github.com/w3c/pwpub/issues/33#issuecomment-460354871, the minimal PEP to be added to make everything compatible is ridiculously simple, and that index.html file could be used as an indication of the nature of the package. A bit like EPUB's mime type file. If there was an agreement among audio publishers that it is fine to add that to the package, we would be done without a major damage...

geoffjukes commented 5 years ago

@iherman I am warming to the idea of making the PEP required based on the minimum of linking/referencing a manifest, and I concede to your point on compatibility.

that index.html file could be used as an indication of the nature of the package. A bit like EPUB's mime type file.

I would like to combine this thought with #34 and particularly the reference to @dauwhe comment on embedded manifests.

By requiring an embedded PEP, the publisher is effectively saying “this is a web publication”. They then have the opportunity to define the manifest to use for the ‘Web Publication’ version of the package. This would be extremely useful for adoption, I believe, as the package itself could have multiple manifests for different applications. It would also allow manifests to be hosted external to the package itself.

On the subject of #34 (as it is a companion ticket to this) I would say that the PEP is required inside the package, but the manifest (which is also required) must be accessible via the reference in the PEP.

[edit: I’m having a difficult time keeping track of the myriad conversations, so i apologize if I am making statements that have already been made/assumed.]

iherman commented 5 years ago

@iherman I am warming to the idea of making the PEP required based on the minimum of linking/referencing a manifest, and I concede to your point on compatibility.

Wow. This may be a way out of the current deadlock...

On the subject of #34 (as it is a companion ticket to this) I would say that the PEP is required inside the package, but the manifest (which is also required) must be accessible via the reference in the PEP.

That is already the case. This is what makes a WPUB, in fact: the manifest MUST be either referenced or included in the PEP:

The primary entry page is the only resource in which a manifest can be embedded. To ensure discovery of the manifest, the primary entry page MUST provide a link to the manifest, regardless of whether the manifest is embedded within the page or external to it.

At the moment the manifest may either be embedded in the PEP (in JSON-LD) or referenced from it if the manifest is a separate JSON-LD file. #34 is based on the assumption that if the WPUB is packaged, the manifest MUST be a separate file; on the other hand, there is another open issue (https://github.com/w3c/wpub/issues/327) which would require just about the opposite: the a manifest may appear within the PEP only. At the moment we do not have consensus on neither of these two:-(

HadrienGardeur commented 5 years ago

If the final decision is that we MUST have an entry page to make the content directly usable on the Web, here is the minimal PEP that is needed:

<html>
   <head>
       <link rel="publication" href="manifest.jsonld">
   </head>
</html>

For me this perfectly illustrates why we shouldn't have the entry page as a requirement in any package:

This minimal entry page would bring zero value to the table, it just makes a packaged audiobook more complex to produce.

This group has widely over-estimated the usefulness of the entry page as its defined today and too many of these requirements only make sense in the context of publications that are primarily meant to be distributed on the Web. For EPUB and audiobooks, these requirements only make things more complicated than they should be, at the cost of simplicity for both authors and UAs.

HadrienGardeur commented 5 years ago

where all our outputs are considered modules that inherit from a common publication manifest, with perhaps no required metadata and certainly no mention of structure. That way we can introduce different required metadata, other ways of harvesting metadata, different manifest/pep discovery models, etc. etc. on a per-model basis.

I mostly agree with your comment @mattgarrish but with a few tweaks:

mattgarrish commented 5 years ago

The original consensus proposal included the fact that a user agent remains compatible with a packaged WPUB without further ado

And this would remain true for Web Publications that are packaged.

But what I see is that we're moving off into a different realm, where we have formats that predominantly live only in their packaged form, with only some desire to be able to make them also deployable as web publications. The path from exploded web content back to these specific formats is probably even smaller. This would encompass audiobooks, epubs and no doubt other forms as well.

The PEP makes it easier to explode these packaged formats onto the web, but it's not a critical part of them in their packaged form. That's why the formats should all inherit the same common manifest format, but whether they need the additional trappings to be web publications remains optional.

I would also say that a requirement of any packaged format is that it must retain the ability to be produced as a conformant web publication, not that it must simultaneously be a conformant web publication.

geoffjukes commented 5 years ago

@mattgarrish @iherman I think I am confusing the wpub and pwpub projects.

Is the scope of this project is to define how to package (i.e. contain in a single redistributable file) a collection of files that conform to the standard laid out in the wpub project?

[edit: w3c/wpub#400 Yep. I am]

geoffjukes commented 5 years ago

@iherman @HadrienGardeur Given the edit above, "If the final decision is that we MUST have an entry page" then from the perspective of an audiobook and epub Publisher, and an Audiobook aggregator/distrubutor, we would have no issue including a reference to a manifest inside the index.html to facilitate manifest discovery.

For publications that already have an index.html, the rel link would be injected. For publications that do not, we would include a bare PEP.

The name of the rel should be established so as to minimize the risk of a clash inside an existing index.html. Alternatively choose a well-known filename that is unlikely to clash with a file that is part of the publication (or make index.html a reserved filename in the wpub spec)

mattgarrish commented 5 years ago

Is the scope of this project is to define how to package (i.e. contain in a single redistributable file) a collection of files that conform to the standard laid out in the wpub project?

We have a bit of overlapping scopes here, to be honest, as we're also looking at whether audiobook = web pub in terms of what has to be packaged and how to discover it. That might be contributing to the confusion.

In general, though, the packaging defined here should work for web pub, audiobook and whatever else we define.

geoffjukes commented 5 years ago

@mattgarrish Thanks for the response.

This may be better suited to the wpub project....

We (Blackstone) have 3 high-level product types in this space.

All 3 products have the potential for supplemental material which may need to be embedded (a PDF for example) or linked (a DVD ISO for example)

For us it would make sense to ensure the two can be sensibly co-mingled. We do not produce Manga, or ePubs with Video assets, but I would assume the complexity of such would match our 'hybrid' product.

I know I seem to be flip-flopping here, but it could be argued that a Packaged Web Publication that is missing a PEP, is Audio only, and therefore intended for consumption by an application running on a device, rather than inside a web browser, and as such the manifest should be consumable directly without requiring discovery.

The only utility I can see in a more 'complete' PEP for audio-only, would be if a publisher had specific display requirements for that title. However, I could counter-argue that it would then fall under the loose heading of the 'hybrid' product, and is not strictly 'audio only' (in essence, it's a 1-page ePub that is a playlist).

Apologies if some of this is naive. My background is operations/infrastructure, not application design. I am involved here because I developed the automation systems we use at Blackstone for asset ingestion and packaging.

mattgarrish commented 5 years ago

The name of the rel should be established so as to minimize the risk of a clash inside an existing index.html.

If we have to resort to specially named files, it would be good to only have one, and I'd lean to that being the manifest given its general utility.

It feels like we could just add a link relationship to identify the pep in whichever of the reading order or resource list it appears in.

geoffjukes commented 5 years ago

@mattgarrish For clarity, is the manifest.jsonld here intended to be the same one as the one defined by the wpub spec, but with non-local resources locations converted to local ones?

mattgarrish commented 5 years ago

For clarity, is the manifest.jsonld here intended to be the same one as the one defined by the wpub spec, but with non-local resources locations converted to local ones?

I haven't seen the issue of non-local resources raised yet, since this was borne out of audiobooks and their not living on the web first.

It would be a problem that needs solving to package web publications, but perhaps that doesn't get handled before the web packaging specification (i.e., there are limits on what can be packaged by this LPF container).

geoffjukes commented 5 years ago

@wareid was kind enough to provide me with clarity.

My opinion here would be that a well-defined manifest, that includes a sequence for media playback, would render a PEP redundant. I would therefore advocate for making the PEP optional but the manifest required in all publications.

From an application perspective, a missing PEP would indicate the publication was 'media only' and render an application-themed media-appropriate playlist based on the sequence.

The presence of a PEP would indicate the publication has specific display requirements, and should be rendered appropriately.

If the consensus is to make the PEP required, I would be OK with including a minimum PEP as described by @iherman

mattgarrish commented 5 years ago

a missing PEP would indicate the publication was 'media only'

That's true right now, but does it hold for future packaged publications? What if an EPUB 4 also doesn't require a primary entry page?

geoffjukes commented 5 years ago

@mattgarrish I’m not sure I follow. If EPUB4 is HTML based, it would have an index. If it is manifest based, it would not need a PEP.

The specifics of how an manifest-based EPUB4 might be interpreted and rendered is not relevant to this ticket :)

I’ll happily edit or remove the paragraph on what the absence of a PEP implies, as I feel the paragraph on what it’s presence implies is still appropriate.

mattgarrish commented 5 years ago

If EPUB4 is HTML based, it would have an index.

The PEP isn't (or hasn't been) about media type, but about deployment on the web. Whether an EPUB is destined for the Web is also a publisher choice, so at this time I can't see why it would be required -- except in the same scenario where we require publishers to produce unnecessary files.