w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

What does a WPub URL resolve to? #94

Closed prototypo closed 6 years ago

prototypo commented 6 years ago

Problem Statement

We have decided that a WPub with be identified with a URL. What should that URL resolve to?

This issue suggests four scenarios and evaluates their ramifications in the hopes of driving toward consensus. A conclusion section at the bottom of this post suggests an answer to the question posed.

Scenarios

The working group has considered several possible answers to this question. They mostly fall into one of four scenarios:

  1. A WPub URL resolves to a JSON manifest file
  2. A WPub URL resolves to an HTML file that contains both a table of contents (TOC) and other metadata
  3. A WPub URL resolves to an HTML file containing a TOC and which links to a JSON manifest file
  4. A WPub URL resolves to a binary package (e.g. a ZIP file or SQLite database file)

Each of these scenarios are considered in turn as they would be viewed by four types of clients:

Scenario 1: JSON Manifest

If a WPub URL resolves to a JSON manifest file, the four clients may be expected to act like this:

Scenario 1 seems inconvenient for both old and new clients with the exception of new WPub aware user agents.

wpub-canonicalresourcejson

Scenario 2: HTML TOC & Metadata

If a WPub URL resolves to an HTML file that contains both a table of contents (TOC) and other metadata, the four clients may be expected to act like this:

wpub-canonicalresourcehtml

Scenario 2 would be handled cleanly for old and new clients, but could cause difficulties when old clients are presented with metadata they cannot understand. There is also some danger of overloading or stretching the use of HTML to define the necessary metadata within an HTML document. New clients would need to parse the metadata out of the HTML to operate upon it.

Scenario 3: HTML TOC & JSON Manifest

If a WPub URL resolves to an HTML file containing a TOC and which links to a JSON manifest file, the four clients may be expected to act like this:

wpub-canonicalresourcehtmljson

Scenario 3 seems to cleanly handle old and new clients in appropriate ways. Old clients could follow their noses to the components of a WPub, and new clients could easily load the JSON object to efficiently access metadata.

Scenario 4: Binary WPub

If a WPub URL resolves to a binary file, the four clients may be expected to act like this:

Scenario 4 could lead to confusion for both old and new clients unless a new MIME type is registered.

wpub-canonicalresourcebinary

My Conclusion

Given the various pros and cons, Scenario 3 (the URL to a WPub resolves to an HTML TOC document, which in turn links to a JSON manifest) seems to most easily interoperate with the existing Web and provides a clean upgrade path to WPub-aware clients.

Acknowledgements

Thanks to @BigBlueHat @TzviyaSiegman @iherman @tcole3 @bdugas @GarthConboy and others from the Publishing WG for discussions which led to this issue and its discussion.

mattgarrish commented 6 years ago

Do we need to require toc in this, or do we only need to require that the link resolve to a resource that is considered the primary entry point for the publication, and that must include a link to the manifest?

It sounds like we're mandating the presence of a table of contents, when not every publication will need one. Using the table of contents, or ensuring the landing document has a clear link to one, is perhaps only best practice in the case of multi-document publications?

dauwhe commented 6 years ago

Yeah, this is one of my fundamental questions. If I point my browser at the URL of a WP, what happens?

> GET /MobyDick/ HTTP/1.1
> Host: www.example.com
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41
> Accept: */*

Strongly agree that the URL must resolve to an HTML document, and that this HTML document must contain a link to the manifest (or contain the manifest; I think that's still an open question).

Like @mattgarrish, I'm a little less certain about requiring this document to contain a TOC. Perhaps if it doesn't contain a TOC, and it is a multiple-document publication, it must link to a TOC? Sadly rel=contents isn't super-standard, but rel=index might be appropriate? Or is it enough that the manifest include a link to a TOC?

Do we require that this be the first document in the default reading order? I think yes. In retrospect, the whole "begin reading" thing in EPUB felt like an evasion. Saying that [1] front matter is so important that it must come at the front of the book, and [2] that it is so unimportant that we don't want the reader to actually see it, is just avoiding responsibility for designing your own content appropriately.

iherman commented 6 years ago

To repeat some arguments, just for the records, against option 2:

My conclusion on option 2 that it should not be adopted.

dauwhe commented 6 years ago

[edited] Ivan, are you talking about option 2? That seems to be the one where some publication metadata is expressed in HTML.

iherman commented 6 years ago

Oh bugger, I wanted to say Option 2! I will edit the comment...

(Lesson: never say anything serious before breakfast!)

rdeltour commented 6 years ago

If the metadata used the HTML elements like , , or even , that would be, in my view, in violation to the HTML spec. That spec clearly says "The meta element can represent document-level metadata". I.e., using that element for the Publication, instead of the for the containing document would be misusing those for a different purpose. We should not do that.</p> </blockquote> <p>There's a precedent however in web apps, where HTML’s <code>meta</code> elements are very frequently used to represent app-level metadata. So I don't think the "violation of the HTML spec" is very significant. Said spec can be updated to pave the web apps cowpaths.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>Per the TOC (or not) in option 3: indeed, maybe the TOC is not the ideal solution, although, in many cases it looks like the natural fit. </p> <p>The current draft lists three <em>required</em> information items: title, list of web publication resources, and a default reading order. I would not want the title appearing in the landing page (at least not being <em>the</em> information item) for the same reasons as in <a href="https://github.com/w3c/wpub/issues/94#issuecomment-342841065">my comment</a>. Not sure whether the list of resources and the reading order should appear there.</p> <p>To say 'the landing page is the first document in the reading order may not be good either: the content may <em>not</em> be in HTML. We had this discussion on the TPAC F2F of publications consisting of drawings or audio files only. For those cases, the TOC is helpful for non WPUB aware browsers.</p> <p>I would say: the landing page contains the TOC (through a <code><nav></code> with some predefined attributes of some sort) <em>if it exists for the publication</em>. If it does not, then the landing page is, essentially, empty as far as the Publication is concerned, and there is no TOC in the publication. I would not expect this to be a very frequent situation.</p> <p>(The landing page may still include other information that publisher wishes to provide. This is not for us to define.)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>There's a precedent however in web apps, where HTML’s meta elements are very frequently used to represent app-level metadata. So I don't think the "violation of the HTML spec" applies is very significant. Said spec can be updated to pave the web apps cowpaths.</p> </blockquote> <p>Maybe there are such apps in the wild, and I do not think what they do is correct. Maybe, but only maybe, the Web Platform WG will, at some point in the distant future, pave that particular cowpath (I cannot judge how wide this usage is), but I do not think this WG should adopt an approach which is in violation with the specifications as of today.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>(Just to make it clear: my personal vote goes firmly for option 3...)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mattgarrish"><img src="https://avatars.githubusercontent.com/u/1565164?v=4" />mattgarrish</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>I do not think this WG should adopt an approach which is in violation with the specifications as of today</p> </blockquote> <p>I'm kind of torn on this, as we're already doing similar in allowing certain information to be harvested from the content (title, language, etc.). It's (arguably) useful information for legacy browsers that don't support the publication.</p> <p>But I agree to the extent that we should not be confusing people that it's either/or. Maintaining metadata in multiple places is always a disaster in waiting. WAM gives precedence to manifest metadata and we should do the same.</p> <p>I prefer Ivan's approach, but can live with what we've done. At a minimum, we can never stop people from expressing whatever metadata they want wherever they want.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/WSchindler"><img src="https://avatars.githubusercontent.com/u/8037744?v=4" />WSchindler</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>Considering the 4 scenarios I think option 3 seems the best solution to accommodate all types of clients. From a user perspective, we would always need some kind of landing page for the WP. Typically, it should contain a <strong>TOC</strong> (via the <code><nav></code> element) to enable the user to visually access the different parts of the WP - i.e. those that are part of the reading order - either sequentially or in a random order. If it doesn't contain a TOC and has more than one constituent documents, we would still need some starting point. Would you agree that any resource listed in the reading order should have a link to the one and only manifest of a WP which contains all the information for a WP-aware user agent to properly consume and render a WP?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dauwhe"><img src="https://avatars.githubusercontent.com/u/5687700?v=4" />dauwhe</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>To say 'the landing page is the first document in the reading order may not be good either: the content may not be in HTML. We had this discussion on the TPAC F2F of publications consisting of drawings or audio files only. For those cases, the TOC is helpful for non WPUB aware browsers.</p> </blockquote> <p>Ah, this helps clarify that I have a concern here. If this HTML document is not the first document in the default reading order, then a WP-aware user agent will present an entirely different resource to the user than a non-WP-aware user agent. Quick example: if my WP has this (very common) order:</p> <pre><code>cover.html (embedded in html) title-page.html toc.html chapter-1.html chapter-2.html</code></pre> <p>I might set things up so that the URL for the WP resolves to <code>cover.html</code> and that contains the link to the manifest. But what if the manifest says that the first item of the default reading order is <code>chapter-1.html</code>? Then a non-aware UA would present the cover, and a WP-aware UA would present ch1? I find that very confusing. </p> <p>I agree that having the URL resolve to a TOC is ideal. You can always hide it visually. </p> <p>I would also argue that something in the WP has to be HTML (or something that supports links). I don't see how you make a JSON manifest + audio/image files only work in option 3, because otherwise a non-aware UA cannot access the content through the WP URL.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mattgarrish"><img src="https://avatars.githubusercontent.com/u/1565164?v=4" />mattgarrish</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>the landing page contains the TOC</p> </blockquote> <p>But why enforce this? What does it accomplish, really?</p> <p>What if I want my landing page to be the cover page and clicking on the cover page takes you to the table of contents?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mattgarrish"><img src="https://avatars.githubusercontent.com/u/1565164?v=4" />mattgarrish</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>a WP-aware UA would present ch1</p> </blockquote> <p>Why would the UA change the document you've navigated to? If the reading order forces you away from the resource you want to view, that's a very bad thing.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rdeltour"><img src="https://avatars.githubusercontent.com/u/520889?v=4" />rdeltour</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>Maybe there are such apps in the wild, and I do not think what they do is correct. (...) I do not think this WG should adopt an approach which is in violation with the specifications as of today.</p> </blockquote> <p>I'm only saying I don't think that using <code>meta</code> would necessarily be a violation. I mean, <code>application-name</code> is part of the standard metadata names <em>defined in the HTML spec</em>, and can hardly be taken for document-level metadata. It's a blurry line, you can always argue that a piece of metadata applies to the document when it gives info on the larger object (app, publication) this document belongs to.</p> <p>That said, my vote goes to option 3. I'm just thinking that it doesn't necessarily precludes mixing it with option 2 to widen UA support, like web apps do: have an authoritative external manifest, but also fallback to HTML metadata when relevant.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dauwhe"><img src="https://avatars.githubusercontent.com/u/5687700?v=4" />dauwhe</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>Why would the UA change the document you've navigated to? If the reading order forces you away from the resource you want to view, that's a very bad thing.</p> </blockquote> <p>I suppose that's what I'm wondering, but perhaps this is more a lifecycle & implementation question. A WP-aware UA opens the HTML resource at the WP URL, and processes the manifest. What is it obligated to do then? I suppose it could present a "begin reading" button that would cause a navigation. Maybe it just creates a navigation/personalization overlay (as Readium Cloud Viewer does). </p> <p>Maybe my point is that it would be silly to have the URL point to something other than what you want the reader to see first. Even RFC 6919 doesn't have formal language for that :)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mattgarrish"><img src="https://avatars.githubusercontent.com/u/1565164?v=4" />mattgarrish</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>A WP-aware UA opens the HTML resource at the WP URL, and processes the manifest. What is it obligated to do then?</p> </blockquote> <p>Right, this is where we (probably) won't find consensus. I'm of the mind that the user agent shouldn't change the resource you're on when it initiates the reading experience. But if it does, I also don't have any problem with that as user's will decide the fate of such a feature. I just don't think such a change should happen without prompting, in which case both the vanilla and enhanced experiences are ultimately the same, there's just an extra opt-in from the user that changes one scenario.</p> <p>I think it would also help to make clear that we're trying to accommodate many reading scenarios somewhere (browser, app (in browser), polyfilled publication, publication as app), but let's not discuss that in this thread.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <blockquote> <p>the landing page contains the TOC</p> </blockquote> <p>But why enforce this? What does it accomplish, really?</p> </blockquote> <p>What I think we should avoid is that the same information item could be expressed in different places. I am afraid of the possible confusion that would result from that (what if they are defined in different places? Which one has priority? What if the information are conflicting? etc.)</p> <p>To be more specific: we have a set of information items that must be serialized. My mental model is:</p> <ol> <li>One, and only one, of the information item appears in the landing page, and <em>the landing page only</em></li> <li><em>All other</em> information items are serialized in JSON (as we decided) and are, therefore, part of the manifest (waybill?) file that is linked from the landing page.</li> </ol> <p>In other words, there is no ambiguity on where a specific information item is defined.</p> <p><em>If</em> this model works that we must choose which information item is covered by (1). The TOC seems to be the best choice..</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mattgarrish"><img src="https://avatars.githubusercontent.com/u/1565164?v=4" />mattgarrish</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>One, and only one, of the information item appears in the landing page, and the landing page only</p> </blockquote> <p>I agree and disagree. There is one piece of information that must appear on the landing page: the link to the manifest. It can appear elsewhere, but whether it has to we keep changing course on (I preferred our original recommendation, so don't object to the subsequent PR that's appeared.)</p> <p>My fears may be overblown, granted, but I think when we stray into mandating content we're going down a potentially restrictive and unwanted path. I'm not going to lie down in the road over this, though.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dauwhe"><img src="https://avatars.githubusercontent.com/u/5687700?v=4" />dauwhe</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>So do we have rough consensus that:</p> <ol> <li>The URL of a WP must resolve to a resource (the "landing page") which</li> <li>must contains a link to the manifest and</li> <li>should contain a <code><nav></code>?</li> </ol> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bduga"><img src="https://avatars.githubusercontent.com/u/3526500?v=4" />bduga</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>I am not sure we need to express metadata level information as meta elements, as I agree with Ivan that seem like appropriating file level metadata. But there is also the option of simply encoding all the metadata in HTML, by (ab)using classes (or another mechanism). That gives us full markup in the metadata, as well as CSS styling. And it means the content can be suppressed for display via CSS. So the landing page could have the text of the title, list of authors, etc that are also displayed directly to the user without duplication (there isn't an author in the metadata and an author on the landing page).</p> <p>[edit] Though, that said, I could live with either 2 or 3.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/prototypo"><img src="https://avatars.githubusercontent.com/u/824236?v=4" />prototypo</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>Oops, I meant to acknowledge @bduga - apologies!</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/azaroth42"><img src="https://avatars.githubusercontent.com/u/871868?v=4" />azaroth42</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>From the cheap seats at the back of #W3CTPAC ... (over several sessions) ...</p> <p>I agree with @mattgarrish. In my experience with the development of the <a href="http://iiif.io/api/presentation/2.1/">IIIF manifest</a> spec, the link to the JSON is of course the most important. Other useful information is the position to enter the document's structure, if not the beginning, allowing multiple entry points with different behavior.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/GarthConboy"><img src="https://avatars.githubusercontent.com/u/7926020?v=4" />GarthConboy</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>I'm (also) okay with options #2 and #3. Per Dave's "should contain a <code><nav></code>?" above, if a "<code><nav></code>" is a MUST somewhere, it seems it's better to be a MUST in a particular place such that we don't introduce opportunities for it two be in multiple places and having the various instances fight.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mattgarrish"><img src="https://avatars.githubusercontent.com/u/1565164?v=4" />mattgarrish</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>What we have in the infoset is that you can specify the toc as a property or you can reference the html nav element that contains it. I think that's optimal, as it doesn't require the toc to be in any particular place, and doesn't necessitate duplication. It also allows different tocs for content and rs if you so desire.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>At this point I believe we indeed have a consensus as described in <a href="https://github.com/w3c/wpub/issues/94#issuecomment-342903893">Dave's comment</a>, and we do not fully agree what the landing page would really contain. I would propose, just for the sake of moving ahead, to put that into the document (in the form of a PR) and open a separate issue on, specifically, "MUST the TOC appear on the landing page" which we will have to cover at some point.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>Added a PR (#98). I have not opened a new issue; should be done only if the PR is accepted for merge.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/prototypo"><img src="https://avatars.githubusercontent.com/u/824236?v=4" />prototypo</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>Should this issue be closed now that PR <a href="https://github.com/w3c/wpub/pull/98">#98</a> has been merged?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <h1>98 has not been merged yet...:-)</h1> <p>Yes, this should be closed when it is merged.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/HadrienGardeur"><img src="https://avatars.githubusercontent.com/u/90989?v=4" />HadrienGardeur</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>Frankly, I don't think that we should enforce anything.</p> <p>If the URL is primarily meant to identify, it doesn't really matter if it points to a manifest, a TOC or a resource of the publication.</p> <p>All these resources will have their own URL anyway, and since we can't expect all publications to have a TOC, I really don't like the proposed resolution.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/llemeurfr"><img src="https://avatars.githubusercontent.com/u/14943186?v=4" />llemeurfr</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>EDRLab's position is that the WPub URL should resolve to a <strong>"landing page"</strong>, ie. an HTML page that introduces the publication to the reader. It MUST contain in its header a link to the JSON Manifest (which has its own URL). It MAY contain an HTML TOC, but it is not mandatory. Such a TOC MAY be machine readible if its structure follows some best practices yet to be defined, but it is not mandatory. It may also present other information, like those found on any bookselling website: a title, authors, cover, description, extracts of reviews etc. Such content is not defined by the spec, it may or may not be machine readable (it may if eg. but WPub author integrates Rich Snippets in the HTML structure, but the WPub spec is silent on this). </p> <p>Therefore:</p> <ul> <li>a search engine, will be able to index this landing page (if the landing page is freely accessible).</li> <li>a user will be able to navigate through the WPub from this landing page if it contains links to some parts of the content (it may be a full TOC, or simply a link to the preface).</li> <li>a WPub aware user agent will follow the link to the WPub manifest and act accordingly. </li> </ul> <p>In a package, this landing page will be included so that a round-trip btw PW and PWP can be facilitated. But the JSON manifest will have a conventional name, therefore the PWP user agent will be able to jump to the manifest without having to go through the landing page.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@llemeurfr, I agree, with the exception of 'should': WPub URL MUST resolve to a landing page. If we want WP-s to be handled by current browsers and not go completely blind on the identifier of a WPUB (or, worse, displaying an unintelligible JSON file), we have to have this. </p> <p>Architecturally it would be nicer if we did not have this requirement. But practicality is what it is...</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/HadrienGardeur"><img src="https://avatars.githubusercontent.com/u/90989?v=4" />HadrienGardeur</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>Why do we assume that the WPub URL will be the primary way of sharing and discovering a WPub ? This is a very different requirement than identifying a publication.</p> <p>Also, assuming that one URL is more important than any other is IMO the wrong way to think about this. We'll have multiple URLs that have specific roles instead, each identified by a rel value:</p> <ul> <li><code>manifest</code> for the JSON manifest</li> <li><code>start</code> for the beginning of the publication</li> <li><code>index</code> or <code>contents</code> for the TOC</li> </ul> <p><code>identifier</code> is also a candidate for the IETF link registry: <a href="https://tools.ietf.org/html/draft-vandesompel-identifier-00">https://tools.ietf.org/html/draft-vandesompel-identifier-00</a></p> <p>By mixing up two different requirements we're not really making things clear at all.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/HadrienGardeur"><img src="https://avatars.githubusercontent.com/u/90989?v=4" />HadrienGardeur</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>I haven't read a strong case so far to justify that the WPub URL (whatever that is) should be any of the above.</p> <p>Many comments in this thread talk about a "landing page". It's very unclear to me why we jumped to the point where WPub URL = landing page, but even if we assume that's the consensus, there are many questions worth asking about such a page.</p> <p>Why should a landing page be the same thing as the table of contents? Why should a landing page contain a link to the manifest? Do we need the landing page to be part of the publication at all?</p> <p>As a publisher, if my publication is behind a paywall, isn't it better to have a landing page that shows basic info about the book plus a giant buy button instead of being forced to link to a manifest + include a table of contents (which might not even exist)? What if my publication only exists as a PWP, how do I link to the manifest in the first place if I don't have one? Am I forced to publish a unique HTML page per PWP that I publish?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@HadrienGardeur </p> <blockquote> <p>WPub URL will be the primary way of sharing and discovering a WPub</p> </blockquote> <p>We are on the Web. That means URLs <em>are</em> the primary way of sharing and discovering things. Various links with rel values are nice and maybe necessary in some circumstances, but on the Web a simple (ie, "unqualified") <em>is</em> the way to discover things indeed. </p> <blockquote> <p>Why should a landing page contain a link to the manifest? Do we need the landing page to be part of the publication at all?</p> </blockquote> <p>See the <a href="https://github.com/w3c/wpub/issues/94#issue-272078084">problem statement</a> of @prototypo: how else would the discovery of a Web Publication happen based on its URL? Put it another way, what should an HTTP request of the WPUB's URL return? @prototypo has laid out the various alternatives…</p> <blockquote> <p>Why should a landing page be the same thing as the table of contents?</p> </blockquote> <p>It is not the same thing. A landing page may include other information, header, whatever (see, eg, <a href="https://github.com/w3c/wpub/issues/94#issuecomment-343955531">a previous comment</a>). The only thing that is necessary is the link to the manifest. This is how a WPUB-aware browser, or a browser with an appropriate javascript running running on top, would find out the WPUB itself. Adding a TOC is only a convenience, although sounds like a useful convenience to me. A landing page may be without it.</p> <blockquote> <p>As a publisher, if my publication is behind a paywall, isn't it better to have a landing page that shows basic info about the book plus a giant buy button instead of being forced to link to a manifest + include a table of contents (which might not even exist)?</p> </blockquote> <p>As I said, it is possible show basic info and any types of button. But how would the browser find the manifest if not via a link?</p> <blockquote> <p>What if my publication only exists as a PWP, how do I link to the manifest in the first place if I don't have one? </p> </blockquote> <p>Although we indeed did not discuss the PWP-only case, I am not sure what you mean by a PWP without a manifest. To my mind, at this point, a WP as well as a PWP MUST have a manifest. That makes it a publication...</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/llemeurfr"><img src="https://avatars.githubusercontent.com/u/14943186?v=4" />llemeurfr</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>I tried to find examples of what I think landing pages should look like. Think about a simplified version of: <a href="https://www.goodreads.com/book/show/1318.Last_of_the_Amazons?from_search=true">https://www.goodreads.com/book/show/1318.Last_of_the_Amazons?from_search=true</a> or <a href="https://www.babelio.com/livres/Zuber-Facebook-ma-tuer/247349">https://www.babelio.com/livres/Zuber-Facebook-ma-tuer/247349</a> </p> <p>Replace the calls to action by link to the "Table of Contents" (as @dauwhe said, sometime its better to have the TOC as a spine item) or a "Read the book" (which could link to the cover), keep only a few reviews, keep the links to other WPubs (recommandations), even the call to rate the book (why not?) and you've got the idea.</p> <p>Such landing page should no be behind a paywall, and it must not be in the spine (i.e. reading order), as it is "outside" of the book. As I said earlier, a reading system does not need this landing page to find the manifest in a PWP. If a PWP is created in a standard publishing workflow, the PWP may contain NO landing page. In such a case, moving the PWP to the Web as a WPub will involve some work (either machine of human based).</p> <p>Landing pages should be considered useful <strong>addresses</strong> for communicating about Web Publications between users, to <strong>pass</strong> a URL from one user to another. Nothing more. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@llemeurfr, I must admit that is not the way I view things. For me, the landing page is an integral part of a WP, and is independent of the page set up by, in this case, a distributor. It may contain information that the publisher want to convey, but it is a bit like the cover page of a usual book.</p> <blockquote> <p>If a PWP is created in a standard publishing workflow, the PWP may contain NO landing page.</p> </blockquote> <p>I do not see it that way. Maybe the term "landing page" is not a good term for a PWP, but the way I see a PWP is that it <em>may</em> be expanded on a Web site and, at that point, it is a perfectly valid WP without any further action.</p> <p>I <em>think</em> that we may have here a (further) possible difference between what we call a PWP and what we call EPUB4. Yes, maybe, an EPUB4 can be one step further away from a WP insofar as the manifest is at an agreed-upon place within the package (much like there are such files in EPUB3), and then the landing page concept may not be necessary. But, as you say, turning this into a WP would require more than just unpacking it on the Web site.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/baldurbjarnason"><img src="https://avatars.githubusercontent.com/u/1508412?v=4" />baldurbjarnason</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>We're already at a point where PWPs are going to be substantially different from regular WPs, even if just from a security perspective (you can't do away with HTTP-addressable locations without losing compatibility with the open web stack).</p> <p>So, with that in mind I'm fine with WPs and PWPs (and EPUB4) having differences when it comes to landing pages. They're going to have a <em>lot</em> of incompatibilities anyway.</p> <p>I do worry a bit whether requiring the WP URL to be a landing page will land us in trouble in the future just on the basis that hard coupling is generally bad, but don't really have an argument against it that hasn't already been brought up in this discussion.</p> <p>From the Pressbooks perspective, where I've been looking into the task of prototyping web-publications, we already have landing pages anyway for each book so this particular feature wouldn't cause undue implementation difficulties.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/HadrienGardeur"><img src="https://avatars.githubusercontent.com/u/90989?v=4" />HadrienGardeur</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>We are on the Web. That means URLs are the primary way of sharing and discovering things. Various links with rel values are nice and maybe necessary in some circumstances, but on the Web a simple (ie, "unqualified") is the way to discover things indeed.</p> </blockquote> <p>Right, and on the Web any URL works. A website is not necessarily accessed from a single URL, there are many options available.</p> <blockquote> <p>@llemeurfr, I must admit that is not the way I view things. For me, the landing page is an integral part of a WP, and is independent of the page set up by, in this case, a distributor. It may contain information that the publisher want to convey, but it is a bit like the cover page of a usual book.</p> </blockquote> <p>That part is still very unclear. Is the landing page part of the publication (a primary resource or whatever we call them these days) or not?</p> <p>Expecting all PWP (and therefore all EPUB 4) to have a landing page on the Web is IMO completely unreasonable, and expecting such landing pages to have a TOC or a manifest linked in there is even more unreasonable.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/laudrain"><img src="https://avatars.githubusercontent.com/u/17047162?v=4" />laudrain</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@HadrienGardeur: all books/ebooks have a Web page to describe them on publisher Web sites at list. We are not far from a landing page.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/HadrienGardeur"><img src="https://avatars.githubusercontent.com/u/90989?v=4" />HadrienGardeur</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@laudrain but we're not talking about books (this is about publications, a much broader category), and not all publishers have a landing page per book, far from it.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/laudrain"><img src="https://avatars.githubusercontent.com/u/17047162?v=4" />laudrain</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@HadrienGardeur so they will have when they will adopt WP/PWP/EPUB4.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/HadrienGardeur"><img src="https://avatars.githubusercontent.com/u/90989?v=4" />HadrienGardeur</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@laudrain this puts a massive barrier to entry in front of EPUB4 adoption compared to previous versions of EPUB, which IMO is a non-starter.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>this puts a massive barrier to entry in front of EPUB4 adoption compared to previous versions of EPUB, which IMO is a non-starter.</p> </blockquote> <p>I think that is an overstatement. EPUB3 has a massive barrier in terms of information that has to be provided (including a nav file), this is less, not more...</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/HadrienGardeur"><img src="https://avatars.githubusercontent.com/u/90989?v=4" />HadrienGardeur</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@iherman it's far from an overstatement and I agree that EPUB3 took a few wrong turns down the road btw.</p> <p>The main issue here is that requiring a landing page is completely disconnected from current EPUB production. In many cases, the people producing an EPUB file would not be in the position to create and/or host such a landing page at all.</p> <p>For a company like Hachette, sure that's something that they can deal with, but we're talking about making it easy for anyone to produce an EPUB4, not just the happy few.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@HadrienGardeur I honestly do not understand. The landing page can be as simple as an empty HTML page with a link to the manifest. Or, if the publication is a single document, it can be the document proper with a link to the manifest. The only thing we say is that the URL of a WP should lead to an HTML page that has a link to the manifest. Anything more is icing on the cake.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/HadrienGardeur"><img src="https://avatars.githubusercontent.com/u/90989?v=4" />HadrienGardeur</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <p>@iherman If the publication only exists as a PWP/EPUB4, there won't be a URL for the manifest in the first place or any resource from the PWP that you can reference using a URL either.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/iherman"><img src="https://avatars.githubusercontent.com/u/520723?v=4" />iherman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>@iherman <a href="https://github.com/iherman">https://github.com/iherman</a> If the publication only exists as a PWP/EPUB4, there won't be a URL for the manifest in the first place or any resource from the PWP that you can reference using a URL either.</p> </blockquote> <p>As I said, this may be a difference between EPUB4 and (P)WP.</p> <p>My (informal and personal) definition of a PWP is that it is a package that, if I unpack onto the Web (even if it is on localhost and serve it in the browser through a server running on my machine), it is a valid WP. (The place where I unpack it providing a natural address, which may be different of the identifier, but that is a longer story). I am o.k. by relaxing this for EPUB4, which <em>may</em> have a landing page, but again may not. </p> <p>At this moment, we are talking about a WP, though.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/llemeurfr"><img src="https://avatars.githubusercontent.com/u/14943186?v=4" />llemeurfr</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>I am o.k. by relaxing this for EPUB4, which <em>may</em> have a landing page, but again may not.</p> </blockquote> <p>@iherman, if EPUB 4 is defined as a profile (-> a subclass, a specialization) of a PWP, and if a PWP imposes a landing page, therefore it will be logically impossible to make the landing page optional in the EPUB.</p> <p>That's why I consider that the landing page in a PWP is to be optional. And this is not a big deal, as it can be created automatically, even on the fly, for the re-hydreated WP <-> your comment "a landing page may be an empty page with a link to the json manifest".</p> <p>At this moment, we are talking about a WP, though.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/TzviyaSiegman"><img src="https://avatars.githubusercontent.com/u/2006752?v=4" />TzviyaSiegman</a> commented <strong> 6 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>If the publication only exists as a PWP/EPUB4</p> </blockquote> <p>Let's be careful not to conflate EPUB 4 with WP. We are talking about WP in this thread.</p> </div> </div> <div class="page-bar-simple"> <a href="/w3c/wpub/94?page=2" class="next">Next</a> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>