Closed iherman closed 5 years ago
Pretty sure there is some way of specifying this so that both existing EPUB3 nav files and commonly used ToC patterns on the web generate useful ToCs using a single algorithm going forward.
I've been thinking about the design for an algorithm that does this.
I'm exploring the idea behind "...run of sibling phrasing content nodes uninterrupted by other types of content" as specified in 3.2.5.4 Paragraphs of the Living HTML spec. I'm trying to fit that in with the context of a ToC hierarchy. My theory is you could have any generic use of grouping elements (<div>
<p>
<ul>
, <ol>
, etc) with runs of phrasing content representing the agnostic equivalent of a list element. The hierarchy of the ToC is inferred from the hierarchy of the grouping elements. If these runs of phrasing content are wrapped in <a>
elements then you have the anchor at that level.
Re: outline algorithm.. maybe it could be mixed in here somehow.. I don't know yet.
This issue was discussed in a meeting.
What is a ToC attempting to accomplish?
We have a few different approaches.
Proposals for TOC?
We could leave the decision up to the author/publisher and create a spec that allows for both options.
If both proposals are adopted, then we would align with the dual TOC approach described in #350.
For the TOC in JSON-LD approach, I've already published an example in the issue at https://github.com/w3c/wpub/issues/291#issuecomment-416363874
If it's helpful, I could also create an alternate audiobook example where the TOC is embedded in the manifest.
Trying again to find some sort of a consensus solution, which does not make anybody too unhappy (this is what consensus means:-)
I agree with @HadrienGardeur that #350 should actually take care of the visually rich navigation. We can haggle on the details (we already did in #350), but the principle is to separate the visually rich navigation from the bare bone TOC. In my view, there should be an explicit entry in the manifest to refer to that visual navigation.
For the "simple" TOC I still believe that giving the possibility to use HTML for a TOC is important, if not essential. (E.g., a longer scholarly paper, or indeed any W3C official document that could be seen as a web publication, usually has a simple, HTML based TOC. Asking the author to repeat that TOC in JSON-LD is simply crazy...) I recognize that in some cases that does not exist, but we could/should (in my view) follow the same approach that we adopted elsewhere, namely to give priority to the authored JSON-LD content, if present, and fall back to HTML otherwise. What this would mean is:
LinkedResource
to build a tree)TOC
term, which is set as follows:
TOC
entry exists in the authored manifest, then we are donerel
value set to content
doc-toc
ARIA @role
value, extract a JSON TOC structure (if is possible) and populate TOC
. (This extraction should be part of the lifecycle section.)TOC
. But, if we go along with #350, we may also have a visually rich navigation
(name to be defined) entry in the manifest that simply says: "go there and do what you can to render this". That is where one would put, e.g., a navigation in SVG. We do not specify whether that navigation is part of the reading order or "only" the remaining resources.
TOC
and a visually rich navigation
can be part of the manifest, and it is up to the UA whether it handles them both, or drops one in favor of the other.(Step 2.2. is what is described in the draft right now.)
We have had a long discussion on how the TOC in HTML should look like. Up until now we do not have a proper and simple "outline" algorithm that would go way beyond a simple extension of what is in EPUB3. Remember we said we would try to close this issue by TPAC... I would therefore propose we do adopt, for now, essentially the EPUB 3 nav
structure (we may ease it somehow, I guess, e.g., @mattgarrish has a clear idea about that already) and take it from there. That means we would have a proper draft; this issue is a glaring open issue in the current one. If we find, or create, some clever outline algorithm later that we can rely on and is more general we can always change what is, after all, a detail compared to the overall structure.
Yes, we may have publications that a vanilla browser would not be able to handle as far as a TOC goes; but this is the choice of the author. We may very well emphasize in the draft that the preferred approach is to put a clear and structurally simple TOC into the primary entry page and refer to it from the manifest; if this is followed, the vanilla browser simply hits that TOC and can happily go on. (The author can create a complex layout with CSS at his/her heart's content, it does not necessarily means that the TOC is 'dull'.) But, say, if the audio guys decide that a primary entry page is not necessary for their case (eg, if it is deployed only in a packaged form) then they would use directly the JSON-LD version without violating the WP spec.
Basically +1 to Ivan's "https://github.com/w3c/wpub/issues/291#issuecomment-437571943". Though, for bullet 2.2 ("otherwise fall back to HTML..."), would said HTML want to be pointed to from the manifest rather than (or in addition to) the "go hunt for it" approach.
@GarthConboy yes, that is what I meant, actually, just did not properly document it. The proposal at that point is to follow exactly what is in the current draft, ie, to localize a reference to an HTML resource among the the ones within the bounds.
I will change the bullet item above to make it clearer.
@iherman with you proposal we end up with 3 possible kinds of ToC:
visually rich
html ToC in a non-specific HTML document, which is not meant to be parsable.In practice, some (e.g. scholar) publishers will use an html ToC in the entry page, some (e.g. novels) publishers will use both the json ToC in the manifest and the visually rich ToC. But none will use the 3 flavors in the same document. This seems to be standardizing a union of needs.
There is a slightly different solution, which merges the two html ToC you introduced, and gies IMO the same result:
As you can see, a different rel value (content parsable
vs content
) indicates to the UA if the ToC should be parsed or not. rel is a space separated list of properties by definition, it works. And no impact on the ARIA attributes.
@llemeurfr I like that alternative. That would mean that we do not need the extra structure defined in #350 either, which is another point of simplification.
+1 to that, thus.
@llemeurfr this feels very similar to what was proposed in #350 with two differences:
contents
) and another one for the "visually rich" TOC, instead you're suggesting a rel to indicate that a document is a TOC (contents
) and another one to indicate that it's parsable (parsable
)The takeway from #350 seems to be that while publishers produce "visually rich" TOCs, they don't care much about marking them as such.
I think Laurent's proposal could be even further simplified to:
"rel": "contents"
in the manifestUAs can attempt to parse that HTML TOC and extract something from it, and if they can't, they could simply link to it and display it as-is.
UAs can attempt to parse that HTML TOC and extract something from it, and if they can't, they could simply link to it and display it as-is.
@hadrien, I added this proposal to use rel="content parsable" to let the author express that the UA should try to parse the visually rich ToC (which takes some CPU) because he has made some work on it. It's a way to assure that the UA will not extract mud from the html structure; you've seen yesterday that a ToC parsing algorithm will have to be VERY smart to extract a clean structure from a "random" html ToC; we should make so that UA can be programmed with some simplicity. Who said EPUB is a contract between an author and a UA?
L
We did not have a chance to discuss this on the call on Monday. I think this discussion is getting into the weeds. We intended to have a group discussion about the merits of the two proposals. I recommend against going into the weeds of HOW to propose this before we have consensus about WHAT to propose.
@llemeurfr I'm conflicted about the idea of proposing both:
We've seen a lot of resistance against the idea of limiting what's possible with HTML, which is understandable. I think a simple fallback to HTML without additional requirements on its syntax is fine if we accept that not all HTML TOC can be parsed (worst case scenario, we'll have a link and nothing more). This would be consistent with our recent decision regarding the title as well (prioritize the manifest and fallback to HTML).
Let's back up a bit, and look at our actual requirement:
The user agent should provide access to the table of contents without leaving [the] current resource from anywhere in the publication.
Web sites do this all the time, just by having navigation available on all the pages of the web site.
Many EPUB2 reading systems used the required NCX file to construct a new user interface element that provided navigation. EPUB 3, as part of the stated goal to move closer to the web, adopted the HTML nav
element, with restrictions, to provide navigation. This HTML could be displayed directly to the user, but could also serve the same function as the old NCX, and provide content for the reading system user interface. The NCX was deprecated.
Is it the sense of the working group that deprecating the NCX was a mistake? That the NCX should be resurrected, except in JSON instead of in XML? Are we stating that, in fact, HTML is not capable of expressing the structure of web content, in spite of the evidence of EPUB3, in spite of the existence of algorithms to extract data structures from structured markup?
I look forward to explaining to the TAG and the AC that we had to invent an entirely new navigation structure for web publications.
@TzviyaSiegman I believe that @llemeurfr has summarized it well, though he did not call it out as a WHAT. Based all the discussion, I believe the consensus could be:
I have the impression that removing any of these options would have its detractor in the group. The fact that the author has a choice to do any of those may be a source of consensus.
Are we stating that, in fact, HTML is not capable of expressing the structure of web content, in spite of the evidence of EPUB3, in spite of the existence of algorithms to extract data structures from structured markup?
@dauwhe I do not think there is an algorithm (we have been waiting for months to see one) that can extract a data structure from any HTML (or SVG) content. I have not seen one, at least.
Actually, I am not sure I understand what you object to: is it
I just try to understand...
I don't think enough of the group has discussed this to call it consensus. I think writing a spec that offers 3 options to do the same thing placates many but is not much of a specification.
Making navigation possible to the user from all locations within the publication. We have offered many solutions to this. We have to come to an agreement about how best to do this. We also have to decide if it is the responsibility of the author or the UA to generate the JSON-format that many UAs are telling us is needed.
@iherman I object to this:
- The possibility for the author to bypass the HTML and express the TOC structure in JSON?
We are proposing to author user-facing content in JSON, when there is an existing HTML element that serves the same function. To me, this moves us further from the web, not closer.
@dauwhe:
We are proposing to author user-facing content in JSON, when there is an existing HTML element that serves the same function. To me, this moves us further from the web, not closer.
I think this whole TOC debate is just difficult to discuss because we are still not clear about what UA we want to target, and what business case we want to solve (please correct me if I'm wrong).
I apologize I haven't participated much in this discussion (even less on sketching an algorithm), especially when I brought up this "algorithm" idea (as an alternative to spec a restricted HTML subformat); but without a better idea of where we're going and for whom, I couldn't find the motivation.
@rdeltour I think we're targeting all of the above.
I really don't understand what @dauwhe is objecting to.
We just closed a similar PR regarding title:
<title>
in the entry pageThis is exactly the same situation.
If an author doesn't want to or can't provide a TOC in the manifest, the UA will simply locate the resource (either the entry page or a resource identified in the manifest as such using rel: "contents"
) and attempt to extract the TOC from it (parse the <nav>
element identified with role="doc-toc"
).
[...] when there is an existing HTML element that serves the same function
As we've seen over and over again, it doesn't serve the same function. While a JSON serialization of the TOC in the manifest will always be easy to parse, the same can't be said about HTML without defining a subset. I don't think that waiting another 6 months will help us write some magical algorithm that solves this problem.
@HadrienGardeur
I think we're targeting all of the above.
right, and to clarify my position: I believe that a single spec that is realistically useful to all these business cases is basically wishful thinking; we see that over and over by struggling to reach consensus on almost every technical solution. We labelled this spec the "workhorse" before, but fitting all these use cases sounds more like the "unicorn" to me.
At the end of the day, it sounds (to me) like we're almost only getting consensus by exhaustion, or at least because a very little number of people still have the drive to still follow the technical discussions.
It also seems to me that the current spec we're creating can and will only be useful for the "Reading Sytem" use case (think Readium 2); and at this point I think we should clearly state that; it will help tremendously in reaching technical consensus. To this end, I think @HadrienGardeur's solution makes the most sense.
Sorry if this comment sounds a bit jaded; I felt it is an important point to make however, as it might explain why we're struggling here and elsewhere. I for one consider TOC to be a crucial aspect of digital publications (including for accessibility reasons), but can't make an informed decision without a clear vision of what we're building. By the way, this is my opinion and not necessarily the one of my employer 🙂.
@rdeltour what exactly do you mean by reading system then?
Even for EPUB we have:
That's a pretty wide range of use cases. What we don't have in EPUB is a browser that renders EPUB without triggering any kind of "reading mode".
Due to the nature of EPUB, some of those RS must jump through many hoops to be able to render an EPUB. For instance, a Web app might need to download an EPUB completely if it can't be streamed (lack of HTTP range requests) and will need to handle ZIP in JS which is less than ideal. It might also need to rewrite URIs in HTML/CSS/JS which is even worse.
I believe that our current spec is already a major improvement over EPUB for such UAs and I'm not even talking about audiobooks where the industry has nothing standard to offer.
IMO we end up in endless debate because some members of this group seem to believe that "Web = everything done in HTML". A quick glance at the state of various standards at W3C immediately shows that is far from being the case, yet it always comes back to such arguments.
Instead of being pragmatic and trying to figure out something that already provides vast improvements for many UAs AND users out there, we're stuck in pointless philosophical debates.
@rdeltour what exactly do you mean by reading system then?
by reading system I mean something that:
I believe that our current spec is already a major improvement over EPUB for such UAs and I'm not even talking about audiobooks where the industry has nothing standard to offer.
Probably; but then let's be clear we're speccing "EPUB" and not a new Web technology.
IMO we end up in endless debate because some members of this group seem to believe that "Web = everything done in HTML".
I rather think people believe that "Web = supported by Web browsers". Unless we try to play by the browsers rules (extensible web), and secure their intent to participate, we can't and won't be helpful to the Web at large. One may find it sad, but it's the reality.
Instead of being pragmatic and trying to figure out something that already provides vast improvements for many UAs AND users out there
Again, the above is correct for "UA" being something else than Web browsers, and "users" being current EPUB users. I'm just suggesting to be clear about that.
Edge is a browser and it doesn't need a "layer over a web browser" to open EPUB these days.
PDF is not exactly Web technology, yet it is supported by every major browser these days.
I think this is conflating politics (= being a priority for Web browsers) with technology.
Edge is a browser and it doesn't need a "layer over a web browser" to open EPUB these days. PDF is not exactly Web technology, yet it is supported by every major browser these days.
I don't think the EPUB implementation in Edge is part of their core Web browser engine, so yes that would be a layer over the web browser in my book. I should have said "browser engine".
I think this is conflating politics (= being a priority for Web browsers) with technology.
Absolutely, it's exactly my point: until the politics are clear(er), we can't make informed technical decisions.
Do we really care if it's part of the core Web browser engine or not? As long as authors and users get all the affordances that we're discussing without installing something on top of their browser, I don't really think this makes any difference.
Absolutely, it's exactly my point: until the politics are clear(er), we can't make informed technical decisions.
For many standards, the politics are not clear. I don't think it's a good way to look at this problem.
Unless we try to play by the browsers rules (extensible web) [...]
To use terms from the Extensible Web Manifesto:
Other standards fall in this category as well (I don't think that Web App Manifest introduces any new low-level feature for the Web).
@rdeltour, I think your conclusion does not correspond to your definition of what a reading system is.
"a reading system is an additional layer over a web browser". Only if by web browser you mean an html renderer ok. Because Edge contains an EPUB reading system: is Edge still a web browser? I suppose yes. Therefore we can rephrase as "a web publication is handled by a reading system; a reading system is an additional layer over an html renderer; a reading system may be a core module of a web browser and it may also be loaded dynamically as a web app".
So, "The current spec is for reading systems" is true, but "the current spec we're creating can and will only be useful for... Readium 2" is false.
If some in this group want to make a "spec for html renderers" only, this is another story; there are pages on the web that appear like publications, but html renderers can't "glue" html pages together and 2 years of discussion have proven IMO that what we want to call Web publications was a step forward.
PS: "Readium" is more than one project. For those interested, today Readium develops several reading SDKs; Readium Mobile iOS, Readium Mobile Android, Readium Desktop and Readium Web. Readium Web will implement the current W3C WP spec, like Edge may implement the current spec if MS decides so.
Do we really care if it's part of the core Web browser engine or not?
I believe the question is not about where this fits architecturally, more about whether it's an effort built for and with the Web community. Currently, I would argue this spec is neither.
The intent to implement by major browsers does make a very very big difference IMO, it's what makes things interoperable or not.
No specs (and lack of interest to spec it) is what make things like reader views or browser extensions a hell to work with.
I don't think that we're defining any new low-level feature for the Web
we're not, but that doesn't mean we couldn't. A collection of documents could be one such building block. Missing parts in CSS count, too. A TOC-extraction algorithm could be a useful building block for interoperable reader views, etc.
I don't think that Web App Manifest introduces any new low-level feature for the Web
I would count "description of Web app metadata" as a low-level feature, but let's not quibble over semantics. I think it is fairly clear we're very very far from the approach recommended by the extensible web proponents, there's no arguing about that.
@llemeurfr
"the current spec we're creating can and will only be useful for... Readium 2" is false.
sure, and please note I never said that: I only mentioned Readium as a case-in-point for "reading system".
@rdeltour is 100% right:
At the end of the day, it sounds (to me) like we're almost only getting consensus by exhaustion, or at least because a very little number of people still have the drive to still follow the technical discussions.
This discussion is the most extreme example of "the perfect is the enemy of the good" I've seen since... The last time we had a discussion exactly like this.
Here is a set of facts which I think we can all agree with:
Tzviya's proposal 1, using a constrained subset of HTML marked up with attributes, would do all of these things. It would make nobody 100% happy, but it would do what we want. Ideally, I'm pretty sure I think it would make everyone grumpily agree, so we could close this ticket and actually move on to something else.
I hate making @dauwhe :weary:, but I don't think that restricting the HTML set we allow for TOC is the end of the world. There's no reason not to make that HTML set less restrictive than we have for epub, and there's no reason it can't be revised to be more expensive in later versions of WP.
I hate making @dauwhe 😩, but I don't think that restricting the HTML set we allow for TOC is the end of the world. There's no reason not to make that HTML set less restrictive than we have for epub, and there's no reason it can't be revised to be more expensive in later versions of WP.
I much prefer some modest restrictions on HTML to inventing a non-HTML format for TOCs :)
I'm perfectly fine with using a subset of HTML for our TOC as well (just like it's been suggested before), and FWIW that was pretty much the conclusion of #350 as well.
If the people opposed to the "subset" approach are now OK with a lightweight take on the EPUB 3 navigation document, I think we can all finally agree and move on.
ok let's settle on a rel=contents pointing at a parsable toc (with restrictions on html, à la EPUB 3), knowing that publishers can always also create a "free" toc in the reading order, with no specific rel from the manifest. A publisher can create one toc, or two tocs for different purposes, that's ok.
And now let's come back to the audiobook TF, with the requirement for audiobook authors to create a parsable html toc. I personally have to issue with that requirement.
Even though audiobooks is being worked on separate from WP my intention in leading the TF has always been to keep it as close as possible. I don't like the idea of the audio spec being wildly divergent from its origin, so the TOC issue is particularly pressing for us. But there's other implications too that I want to illustrate for our options (NB for the purposes of these illustrations I feel the need to divide web browser UAs and reading system UAs despite my strong feeling they should be considered equally):
The JSON TOC still has a few important applications I think need to be considered before we make a decision:
A subset HTML TOC needs to consider:
The steers close to best practices but perhaps the best option is to leave it to two options, but with clear instructions for when which is appropriate or even required. Publishers can also create their "free" TOCs in the reading order as they please, like Laurent mentions. Web browser UAs will skew towards HTML unless they adopt a specific mode for WP. Reading system UAs will skew towards no longer having to parse data from HTML for their own uses if we allow it.
@wareid it's worth pointing out that the current examples for audiobooks (both the WP and the packaged version) are using HTML.
You're correct that HTML is indeed harder to parse for most UAs than JSON, but as we've seen with EPUB3, it's not impossible either as long as we have a subset of HTML with specific rules regarding authoring.
My personal preference is still to define it in the manifest with a fallback to HTML (as described in https://github.com/w3c/wpub/issues/291#issuecomment-438311067) but I can live with a machine readable TOC in HTML as long as we have clear authoring rules.
This issue was discussed in a meeting.
RESOLVED: The WP manifest will have a reference to a machine-readable TOC, the draft will have to define the HTML structure for it. The TOC is recommended. There should be documentation in the spec on how that TOC is to be used by Reading Systems and Authors.
RESOLVED: close issue #350, possibly replace it with a more general notion of landmarks
Resolutions of last meeting incorporated in https://github.com/w3c/wpub/pull/371; closing.
(This issue was originally discussed in #285, but needs to be migrated to a separate issue.)
@HadrienGardeur
@iherman
@HadrienGardeur
@llemeurfr
@TzviyaSiegman
@dauwhe