Closed iherman closed 5 years ago
See further (but longer) comments
It's a discussion that we've been having since the very start of this WG.
I don't see any end in sight to this discussion, I believe that we'll need to live with the fact that we can't have a one size fits all solution to this problem.
Let me suggest a path forward. What we need is a way to tag an HTML structure semantically (in order to allow UAs to generate the machine readable ToC) in a standard manner, without bothering too much authors about 1/ having too many contraints on the choice of html tags 2/ dealing with overly complex semantic markup.
And the solution could be ... microdata (a web feature, html5 and schema.org compliant).
Here's a sample
a) which is quite complex already, because one chapter contains two sections (hierarchial ToCs seem to be a minority in practice, most implementations would be therefore simpler),
b) which uses dummy html tagging, not nav, to prove the point that any html structure can become semantically correct.
c) we should use that microdata tagging as an alternative to the EPUB 3 nav
/ol
tagging, also allowing ul
by the way for the sake of flexibility. Proposed for those authors who want free markup.
<html>
<head/>
<body>
<table role="doc-toc" itemscope itemtype="http://schema.org/ItemList">
<tr><td><a href="chapter1.html" itemprop="itemListElement"/>Chapter 1</a></tr>
<tr><td><a href="chapter2.html" itemprop="itemListElement"/>Chapter 2</a></tr>
<tr><td><a href="chapter3.html"
itemprop="itemListElement
itemscope itemtype=""http://schema.org/ItemList"/>
<span itemprop="name">Chapter 3</span></a>
<tr><td><a href="section3.1.html" itemprop="itemListElement"/>Chap 3 Section 1</a></tr>
<tr><td><a href="section3.2.html" itemprop="itemListElement"/>Chap 3 Section 2</a></tr>
</tr>
</table>
</body>
</html>
@llemeurfr,
I cannot really comment whether this approach works with the various constituencies and whether it covers the issues the community has; I leave that to those who have this experience (I do not).
From my point of view this works, but I do have one comment. Although, these days, microdata is indeed mostly (exclusively?) used by schema.org, its definition does not preclude other usages. Ie, we are not required to use schema.org terms like ItemList
; it is perfectly o.k. to define our own.
So the question becomes: does it bring any additional value to use schema.org terms? Does it make a difference if schema.org clients take this information alongside the information they can extract from the manifest? Or does it actually muddle the waters of the resulting data, so to say? Because if the answer is 'no, it is does not bring anything new', then it may be cleaner if we define our own URL for ItemList to be used in @itemtype
.
But that is a minor detail.
Our spec says,
The user agent should provide access to the table of contents without leaving current resource from anywhere in the publication.
We also provide the means of identifying the table of contents from the manifest.
Our spec does not say that there must be a "machine-readable" table of contents, or that user agents must somehow process a table of contents.
What exactly do we mean by "machine-readable"? What problem are we trying to solve? I feel like we are drifting far from HTML, where the nav element was explicitly designed to provide navigation, for both humans and the assistive technology that humans use. Do we feel that machine processing can create a better TOC than humans could do? Or are we just trying to easily generate the kinds of built-in navigation that many EPUB reading systems use? Are those better for readers than the actual nav files they are generated from?
The requirements of our spec could be met by creating a pop-up window with the nav[@role='doc-toc']. If we demand more than that, we should spell out the requirements.
Websites work fine with existing navigation. What is so different about web publications?
Accessibility is the main issue.
Websites work fine with existing navigation.
Not so obviously. On the a11y side, web sites can be un-navigable. We don't want that for WP.
Dear @TzviyaSiegman can you explain to me how your comment in https://github.com/w3c/wpub/pull/285#issuecomment-409674233 fits with your thumb up to https://github.com/w3c/wpub/issues/291#issuecomment-410251143? Pardon my french but it seems to be ... antinomique (translatable as antithetical or so it seems).
Are those better for readers than the actual nav files they are generated from?
The information extracted from a TOC is used in a number of different ways in EPUB UAs:
I think you're absolutely right that we need to agree on the requirements first, but it's important to point out that this goes beyond rendering just a simple TOC.
Not so obviously. On the a11y side, web sites can be un-navigable. We don't want that for WP.
Agreed. But I don't see the needs of WPUB TOCs as being any different from those of the TOCs of web sites or web apps. If there are problems, let's fix them in HTML and WCAG, and fix them for everyone. Creating distance between the web and web publications is not going to be sustainable in the long-term.
@llemeurfr One of the most useful things in EPUB 3 for me (as a publisher) is being able to create a machine-readable, accessible TOC using the simple nav
file. This too could serve the purpose of a visual "page". My point in https://github.com/w3c/wpub/pull/285#issuecomment-409674233 is that I don't think there is as much value in the Contents page as some would have us believe.
Using HTML nav allows for accessible navigation, ease of authoring, and design (if it is desired).
Creating distance between the web and web publications is not going to be sustainable in the long-term.
@dauwhe you mean microdata creates a distance from the Web? Isn't it Web technology?
@dauwhe you mean microdata creates a distance from the Web? Isn't it Web technology?
Laurent is proposing that TOCs in WPUB be marked up with microdata. Labeling TOCs in such a way is not common on the web. I think this makes things more complicated for authors, and prevents easy reuse of existing web (and EPUB!) content. I think that we should try very hard not to impose new requirements on authors unless they are strictly necessary.
@dauwhe I'm surprised by your statement. It's like the "Web" was narrowed down to HTML5/CSS only, with an ounce on JS. A quick look at e.g. http://www.allocine.fr/film/fichefilm_gen_cfilm=238132.html proves that microdata IS used on the Web, definitively.
But I agree that most HTML/EPUB authors will have hard time to type microdata attributes by hand. This is why I proposed it as an extension of the more basic nav/ol (see my point c).
@dauwhe I don't believe authors would key in manually ToC as they don't key manually any Web page. I don't see obstacles with this kind of markup (or any other we would choose) in reusing existing web or EPUB content. There will be tools for that as there are tools to generate the nav file (as there was for the NCX).
I am very wary of assuming that tools will come just because we create things. I recall the collective sigh of relief when the very simple nav came to replace the ncx. I have spent many hours repairing ncx files manually. Why should we make this more complicated than necessary?
Is there reason to believe that user agents (browsers) are going to create a table of contents widget? It doesn't seem in keeping with their interests or priorities. Other inventions, like longdesc, have languished on the expectation that user agents will make an interface for their use.
I'm not sure what the answer is to this problem, but there is a case that getting user agents just to provide a link to where the table of contents is encoded might be a win for 1.0. Devising rules for parsing the structure might be something to leave to EPUB 4 and see if we can build out actual implementations from there.
I can't see that we'd want to do something different re TOC in EPUB4 versus WP.
I am a fan of having a TOC that can be machine readable -- all decent EPUB RS's present some form of pop-up TOC from the Nav file. I'd expect the same from WP capable UA/RS's (so I don't see deferring to EPUB4).
We did the EPUB Nav file in XHTML such that folks could use it both for machine readable and presentable -- I don't know how successful we've been -- at least sometimes there is both a Nav file, and another version in XHTML markup for in-progression presentation.
I think our historical goals with the Nav file were laudable – if we can figure out how to do better in WP (and thus packaged into EPUB4), that would be great – somehow balancing the sometimes competing priorities of machine processable, presentable, and Web-ish.
A machine could create a serviceable navigation aid just by taking all the a
elements in a nav
in document order. You wouldn't get nesting, but I seem to recall some EPUB reading systems having that problem.
Only the author will know exactly how they want the TOC expressed. We should let them! If someone creating a WPUB has to look at our spec to know how to create a nav
element, it means we've failed.
:-) to go with the 👎 above.
A machine could create a serviceable navigation aid just by taking all the a elements in a nav in document order. You wouldn't get nesting, but I seem to recall some EPUB reading systems having that problem.
Yea... but many do a nice job. We should have a spec that enables high quality WP-aware UA/RS's. Thus, the balance I think we should be striving for, stated above.
Yea... but many do a nice job. We should have a spec that enables high quality WP-aware UA/RS's. Thus, the balance I think we should be striving for, stated above.
I'd urge us to find clever ways of processing TOCs rather than dictating to users how to structure them.
It seems that we are in a deadlock... my proposal is to move forward as follows:
(1) we do not specify a detailed structure for the TOC; there is clearly no consensus on this
(2) we modify the WebIDL spec for the TOC as follows:
HTMLElement TOC;
Which refers to the HTML element that represents the TOC (as described in the draft, ie, a conversion from WPM to an internal representation can find this element, if it exists) and we stop there.
@iherman what do we do with the doc-toc role:
TOC should be present
- if present, it MUST be an element with doc-toc role pointed to from manifest
At least, we should draft from what has been implemented in EPUB 3 (nav/ol/li/a), with an extension for nav/ul also (many HTML samples use ul, including the HTML5 spec).
This offers a correct hierarchical structure, that UAs can exploit, and nav is the proper html tag for screen readers.
This is not as flexible as some authors would like, but it seems that none is able provide practical clues now about the level of flexibility they want (I sent a request to EDRLab members, with no feedback so far) and we have seen that semantic microtagging (via microdata or rdfa) is rejected by some WG members.
@laudrain I am not sure what the question is... Isn't the description in https://w3c.github.io/wpub/#table-of-contents clear what should be done?
@llemeurfr per https://github.com/w3c/wpub/issues/291#issuecomment-411672039: it seems that there isn't a consensus for this. I personally do not mind putting something like https://github.com/w3c/wpub/issues/291#issuecomment-410238445 in an informative appendix, for example, but I do not see any way forward with a consensus in the WG...
@iherman description in https://w3c.github.io/wpub/#table-of-contents is ok for me.
Maybe just note in the draft that there remains an open issue regarding how a viable machine-readable TOC is communicated to a UA/RS, with three possible options:
Maybe just note in the draft that there remains an open issue regarding how a viable machine-readable TOC is communicated to a UA/RS, with three possible options:
- Punt on this requirement/desire
- Constrain the TOC HTML markup
- Add an additional (likely optional) machine-processable TOC
We haven't been very successful with 1 and 2 seems quite unpopular among this group.
I'm starting to think that 3 might be the only option, although it implies some level of redundancy.
The problem of machine-processable (say a json string?) is TOC sometime still needs markup which means need to parse the markup or we drop those markup. One of example is Japanese ruby in TOC item. I also remember seeing some content with css applying to TOC item. I feel the ideal way would be UAs support rendering two htmls. I kinda remember there was some discussion about two htmls rendering maybe 3 years ago but don't remember if any update after that.
True. One possible approach to #3 would be an optional/additional constrained markup HTML version -- similar to the EPUB 3 Nav file.
I really doubt that UAs can extract CSS (or MathML) at all for a machine readable TOC. That's what a renderable TOC is for IMO, it gives you all the power of the modern Web.
My personal preference for an optional machine-processable TOC would be to express it in the manifest directly, especially now that we only have two options on the table for handling internationalization in there.
Re. Garth's list, there is a forth solution, which is to add precise semantics to the HTML structure using semantic markup.
This was proposed in a previous comment, rejected by 2 people & loved by one.
The current state of the spec, which requires nothing from the ToC structure, is a regression compared to EPUB 2 (NCX) and EPUB 3 (Nav doc). We cannot keep it like that.
The current state of the spec, which requires nothing from the ToC structure, is a regression compared to EPUB 2 (NCX) and EPUB 3 (Nav doc). We cannot keep it like that.
And requiring a specific structure from an HTML ToC (as opposed to recommending one) is a regression from how the web works. Which is also a non-starter. Hence the stand-still.
And requiring a specific structure from an HTML ToC (as opposed to recommending one) is a regression from how the web works. Which is also a non-starter. Hence the stand-still.
Which is why we should defer this to EPUB4 rather than discuss it in the context of WP.
This issue was discussed in a meeting.
Some UA will be able to read both EPUB 2 & 3 and WP. Here is what will happen if no machine readable ToC is specified for WP.
As a user, when I read an EPUB 2 or 3 publication, I'll be able to access a structured ToC from some icon, e.g. in a panel view. The layout of the ToC will be the same in every publication and will adapt to the screen size. As a user, when I read an WP, I'll be able to access a "full page" ToC from some icon, e.g. as a modal window. The layout of the ToC will be specific for each publication. This page may be responsive (CSS grid welcome), or not (in which case the user experience will be bad on small screens).
Problem Statement: Differing opinions about how to specify the ToC in WPUB. Some are of the opinion that a ToC can include anything as long as the HTML includes aria role="doc-toc"
to signal to the UA that it is the ToC. Some are of the opinion that the ToC must be more strictly defined so that the ToC can be processed in a standard way by UAs.
Who is impacted: Authors need to be able to easily write a TOC. Users of AT require accessible ToC. UAs require standardized input.
Potential Solutions: (as discussed in Toronto F2F):
Consider this table of contents:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>A Christmas Carol TOC</title>
</head>
<body>
<nav role="doc-toc">
<h1>Contents.</h1>
<h2>Stave One.</h2>
<p><a href="chapter1.html">Marley’s Ghost</a></p>
<h2>Stave Two.</h2>
<p><a href="chapter2.html">The First of the Three Spirits</a></p>
<h2>Stave Three.</h2>
<p><a href="chapter3.html">The Second of the Three Spirits</a></p>
<h2>Stave Four.</h2>
<p><a href="chapter4.html">The Last of the Spirits</a></p>
<h2>Stave Five.</h2>
<p><a href="chapter5.html">The End of it</a></p>
</nav>
</body>
</html>
This is valid HTML. It provides links to the major sections of the publication. But this would not be valid as an EPUB 3 navigational document. It would be quite simple to extract a useful data structure from this. Of course, this is one of the simplest examples possible. But I'm very interested to discover HTML TOCs that would not be machine-processable in any useful way.
I realize it is a bit of an esoteric example, however: the doc-toc element may be an embedded SVG file, containing the navigation links spread over the graphics. The SVG content not being linear by nature just picking up the links in document order may not work...
I realize it is a bit of an esoteric example, however: the doc-toc element may be an embedded SVG file, containing the navigation links spread over the graphics. The SVG content not being linear by nature just picking up the links in document order may not work...
I think it's a really interesting example, and illustrates the fundamental conflict here. The web is a rather expressive and unfettered medium. You can present any markup to an HTML user agent, and it knows what to do (although what it does might not be useful). EPUB is by comparison very constrained: do things a certain way, and the user agent might do something useful. This issue is asking us to choose.
One of the most iconic website was created in 1996 for the movie Space Jam. The entry page is a site map, a ring of images wrapped in a
elements. Just taking those links, and labeling them with the alt text of the image, would produce a very serviceable data structure for a reading system.
Real-world TOCs:
This being said, I'm not arguing for being able to have simultaneous, non-compatible navigation documents in a single publication, because I think it's confusing as heck. But scanned books are a real world example of this problem. As OCR improves, more and more of them have a partial parsed TOC, as well as an identified but non-matching chunk of scanned text which is the TOC.
(Obligatory disclaimer: Nothing isn't machine-processable in any useful way, if your UA has infinite resources for image processing and text parsing. But I'd argue that these aren't machine-processable in any practical, likely-to-occur way.)
This is usefully identified as a TOC (so a sighted user can use a reading system to jump to the section labelled "table of contents"), but is not usefully parseable by a reading system.
I think these are the two separate concepts that we're working with:
Is it a must have that these two resources be one and the same? I don't think so.
I agree, @HadrienGardeur. My instinct is that the parseable TOC is the only one that needs its own descriptive element -- the nav doc, as it were -- and the plain HTML TOC is just that: a plain old chunk of HTML which may have an H2 before it saying "Table of Contents" (or "Sommaire" or "תוכן העניינים" or, for that matter, "List of My Stuff" or "🗂️📖").
The question is, if you don't have a machine readable TOC, but do have a plain HTML TOC, should aria role="doc-toc"
be something that could trigger a UA to try to build a nav from the HTML?
Actually, let me make that my proposal:
aria role="doc-toc"
, then the reading system may try to construct a navigation tree from the first section tagged aria role="doc-toc"
.If we provide a set of guidelines for how to reliably lay out a doc-toc section to create a navigable TOC we will have happier users with more consistent results, instead of surprise that UAs didn't know how to parse a <canvas> TOC.
I think we could break this down into the following elements:
contents
)role="doc-toc"
)The alternate plan would be to directly include the machine-processable TOC in our JSON manifest (uisng a toc
element that would behave almost exactly like readingOrder
or resources)
.
I like Hadrien's (and Deborah's). Would prefer to keep the machine processable (structured) version in HTML... such that both could be the same, if desired, and authors (whom I usually don't prioritize) will likely have an easier time with HTML than JSON.
It's also worth pointing out that with this proposal, you could create a document that's both meant to be machine-processed AND rendered.
You would simply:
readingOrder
or resources
rel
valuesrole="doc-toc"
+ requirements for structuring the content)Both of @deborahgu examples would probably work better using two separate documents, but @dauwhe example could fit both use cases.
To illustrate the alternate option using JSON, here's an example extracted from https://github.com/w3c/wpub/issues/247#issuecomment-400458867:
"toc": [
{
"name": "Part 1 - This World",
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_1_abbott.mp3#t=71",
"children": [
{
"name": "Section 1 - Of the Nature of Flatland",
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_1_abbott.mp3#t=80"
},
{
"name": "Section 2 - Of the Climate and Houses in Flatland",
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_1_abbott.mp3#t=415"
},
{
"name": "Section 3 - Concerning the Inhabitants of Flatland",
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_1_abbott.mp3#t=789"
}
]
}
]
Doesn't this just lead down the road to forcing all authors to specify two tables of contents? How many user agents are going to only use the non-machine readable version (thinking browsers here) and how many are going to use the machine readable? Without knowing, you have to provide both to ensure that the user has access to something.
And what happens when the supposedly machine-readable table of contents gets tagged incorrectly, or the author picks the wrong table of contents identifier, since no one ever looks up definitions in details? Do we really want to pull any navigation to punish authors, or, like the web, do we expect the user agent to do the best it can. It might not be very good, but it won't be very good for everyone. Trying to solve that problem by specification seems destined to fail.
Or what I'm questioning is why we need two things for one purpose?
If the table of contents structuring rules are not required but recommended for effective parsing, then what purpose does a separate table of contents serve? If the user agent can't parse out the links, then the user agent can make it a linked document, or whatever it chooses to do. It's still a table of contents that the author has specified for machine purposes.
I don't think we're solving the problem of a separate table of contents that isn't intended for any processing but only user presentation by going the two semantic route. If the author wants a table of contents that isn't for use by machines, why are we concerned about marking it available for machine use? The machine-processable table of contents can be something that simply lives in a document that isn't reachable in the reading order.
I'm hopeful that we can find a solution that tries to find the best path between both worlds. A less restrictive table of contents, but with enough structure for effective extraction when followed. A fallback in which the UA attempts to tease out the links, or ultimately just uses the table of contents for display.
If I get my html/css wrong, I don't expect draconian handling that rejects the content (xhtml a prime example of how well that approach took). There will be flaws in the output, but that's my fault. That's the world we're building web publications into, and should inform our decisions.
@mattgarrish said
why we need two things for one purpose?
yeah, this is exactly my concern with the 2-tocs proposals above. Or rather, I could understand us allowing a JSON TOC in the manifest and an HTML TOC, with the latter being used to generate the first if it's missing, but a solution consisting of two HTML TOCs really puzzles me.
The examples of "unparseable" TOCs are actually the very reason why I think the TOC should be unconstrained: there will always be cases where the author wants or needs to deviate form the constraints.
The case of the graphical SVG TOC (also mentioned by @rakutenjeff in the call) is interesting; but then if the question is only one of finding a proper order for the links, which may not be in document order due to the SVG's internal organization, then it's a problem already solved by the Web for keyboard navigation, with tabindex
.
I'm still failing to see a real case where a (reasonable) hierarchy of links can't be extracted from random HTML content, given a robust extraction algorithm. This algorithm btw wouldn't be much more complex than the algorithm used to extract a TOC from a constrained HTML structure.
The real question IMO would be: is it acceptable for reading systems to implement such an algorithm to extract the TOC, or is it a non-starter? If not, then I reckon we have to reexamine the JSON-TOC-in-the-manifest option.
Finally, we could also consider the possibility of thinking about APIs, that could either be used by an author to get the TOC (without parsing the HTML or JSON TOC themselves) or to feed the TOC (for the reading system to consume). Something like this is missing from the Web, and may be useful for things like reader views and other cases. But I know it's unlikely that this approach will be taken by our WG :-)
(This issue was originally discussed in #285, but needs to be migrated to a separate issue.)
@HadrienGardeur
@iherman
@HadrienGardeur
@llemeurfr
@TzviyaSiegman
@dauwhe